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SERIES FOREWORD 


Basic Concepts in Psychology was c'onceivcd as a series of bric'f 
paperback volumes constituting a beginning textbook in psychology. 
Several unique advantages arise from publishing individual chapltMs as 
separate volumes rather than under a single cover. Each book or chapter 
can be written by an author identified with the subject matter of the 
area. New chapteis can he added, individual chapters can b(‘ rcwised 
independently, and, possibly, competitive^ chapters can bi‘ i>rovided foi 
controversial areas. Finally, to a degree, an instructor ol th(‘ bc'ginning 
course in psychology can choose a particular set of chaptcTS to mc'ct tin* 
needs of his slSudcnts. 

Probably the most important impetus for the senes came from the 
fact that a suitable textbook did not exist for the beginning courses in 
psychology at the University of Michigan— Psychology 100 ^Psyclyilogy 
as a Natural Science) and Psychology 101 (Psychology as a Social Sci- 
ence). In addition, no laboratory manual treated both the natural .science 
and social science problrtns encountered m the first laboratory course, 
Psychology IK). 

For practical rather than ideological reasons, the initial complement 
of authors c*()me.s from the staff of iJie University of Michigan. (Coordi- 
nation among geographically di.spersed authors seems ncedic'ssly difficult, 
and the diversity of poiqts of view in the Department of Psychology at 
Michigan makes the dangcT of parcxhialism quite small. 

Each author in the Basic CConcepls in Psychology Series has con- 
siderable freedom He has bf‘en charged to devote approximat(*ly half 
of his resources to elementary concepts and half to topics of special in- 
terest and emphasis In this way, each volume will reflect the personality 
and viewpoint of the author while pre.senting the subject matter usually 
found in a chapter of an elementary textbf ok. 
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INTRODUCTION 


1 


The sea contains many small organisms that have little or no capacity 
as individuals to adjust to minor changes in the environment. The 
metabolism of these passively floating organisms depends direc’tly on a 
narrow range of temperature and chemical conditions. Their survival 
through the ages has deptMided on variation among individuals- those 
best suited to slight environmental changes have survived; those least 
suited have perished. 

As one proceeds up the phylogenetic scale, the capacity to behave 
increases, and the behavior available to the organism in adjusting to his 
environment becomes more complex. The behavior of some species, 
although cximplex, remains rigid and stereotyp(‘d. Such bidiavic^- is 
referred to as instinctive— little capacity to learn is in (*videnee, and there 
is great dependence on a particular environmcMital patt<‘m In other 
species, there appe*ars to be more capacity to vaiy behavior m response 
to environmental change, more capacity for learning, and a wider rang(‘ 
of environments in which an individual can survive. Thus variability in 
the individual’s behavior— and individual survival under changing cir- 
cumstances- becomes an important nu^chanism in th<‘ survival of the 
species. 

Man^can live and thrive m most of the (Mivironnients on earth and 
is less subject than most species to extinction by gradual chang(‘s in the 
environment. Of all living creatures, he seems to ac(|uire the gri'atest 
store of information, to develop the gr(‘atesl r(*pertorv of motor ‘kills, 
and to modify his behavior, through le.iniing, to meet the greatc'st iang(‘ 
of situations. There seems to lie an evolutionary process in which men 
with the greatest ability to learn and to ary their behavior survive* 
at the expense of tho.se with less flexible behavior and less l(‘arnirig 
capacity. 

WHAT IS LEARNING? 

Let us say tentatively that learning is a change in performance that 
occurs as a result of experience. Tliis short and simple statement apjxMrs 
to include everything that one would want in a definition of beaming. 
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ft f'loarly inipli(‘s the concept of variability— variability is a prerequisite 
to any ( han^(* in p(‘rfonnance. J^’iirthermore, we* used the phrase “change 
in p(‘rforinanr(‘’‘ in contrast to th(‘ more commonly usell “improvement 
in p(‘rfnrmanc(*'* ynce one can ac(|uire bad habits as w(Ml as good ones. 
The wold "iinprovenient’* w'oiild eliminate many interesting problems 
from the study of learning. 

The major difficulty with this simple definition is that there are 
tinn's when wc can attribute “(hanges in performance that occur as a 
ri'snlt of experience” to l^ic^ors other than l(*armng. Before attempting 
a more precise definition of learning, we should consider some of these 
factors— maturation, fatigu<‘, motivation, and changes in the stimulus 
situation. 

STIMULUS-INDUCED MATURATION 

The eoncept of maturation usnallv implies the appearanc<* of 
capacities (ir b<*haviors that are due to the nature of an organism and 
not to its e;:p(Tii*nce. For c‘xample, iiew'born puppies come* into the w'orlcl 
with clos(‘d (‘yes, as do the* newborn of many oth(‘i species. When a 
pup’s (‘yes become fullv open as he matures, his behavior changes. 
There is little doubt tlnit this change in behavior is th(’ result of matura- 
tion-*^iiot of (‘Xp(‘rience oi learning. (>n the other hand, maturation its(‘lf 
sometimes seems to be a product of stimulation. Riesen (1950) has 
show'll that young chimpanzees reared through an early critical stage 
of their lives without experiencing light do not devtdop propc'rly- their 
vision IS irreparably d imaged. Thus, in c(‘rtain instances, the distinction 
between matin ation and learning is far from clear cut. Moreover, we 
do not yet know' exactly which asp^ts of development can be attributed 
to maturation and which rcHjuire stimulation and thus can be attributed 
to leaining. The history of psychology recoriis many lengthy debates, 
none of w'hich has produced a clear answer. Another volumV in this 
series (McNeil, TJie (Umcept of Human Development, 1966) emphasizes 
maturation in the cxintext of normal development, while th(‘ present vol 
lime is concerned wuth accpiired behavior in whic h maturation plays less 
of a j-ole. Our definition of learning wdll exclude behavior that results from 
normal maturation. 

FATIGUE 

Practice* makes jierfect, it also makes one tired. Although fatigue 
can produce “a c'h«inge in performance that occurs as a result of expe- 
rience,” we do not consider such a change to be learned. Tlie exhausted 
behavior of the tired runner disappears with rest, and his former swiftness 
is recovered. I^earned behavior is more jiermanent. Thus w'e can use the 
relative pennanence of behavioral C'hanges to distingui.sh between learn- 
ing and fatigue. 



INTRODUCTION 


3 


MOTIVATIONAL CHANGES 

\\> also want to distinguish hotvec*!! motivation and UMrning. 
Learned acts an' usually pt'rforined well onlv when one is niotivati'd to 
porfonn them, and practice can sometimes produi'i* diaiiges in motiva- 
tion that result in changes in perlormaiiee. Such changi's, howevt'i, art' 
not the result of learning. For ex»imple, suppose wv want to have a lat 
learn the shortest path from the start to the goal of a comjdieatc'd ma/e. 
To make sure that the animal will he motisated to li'arn. we di'jirivi* him 
of food for a day and use food as .i reward h\ placing it in the goal hox. 
Each time the rat reaches tlu' goal 1h>x, w<‘ allow him to eat lor «i w^hile 
and then we place him hack in the staitmg hox. For .i numher of tiials. tlu* 
animals performance keeps improving Ih' runs fasti'r (*ach time and 
makes fewer errors. Rut then hi' hi'gins to nin more slowlv, to wandei 
around in hlind allevs. evt'iitiialK Ik' does not nm at idl hut toils up 
and goes to sleep somewhere in the ma/e He Ilo, we think, no further 
interest in gt'tting to the goal hox lor mort' lood. His hehavior has 
changed, and the change has occurn'd as a result of (‘xpt'iienee hut 
was the change due to motivation rather than learning'^ We (‘an di'pnve 
him of food for a second day and then place him hat k in the ma/t'. If he 
runs rapidly to the goal wuth few' errors, we can he K'lativelv snreHhal 
the performance* changt* near tht' end of the fiist day was tin* n'sult of 
a change in motivation. We do not wish to attribute such a change* 
to learning. 

CHANGES IN THE STIMULUS SITUATION 

By learning, w’e mean some change that occurs inside the organism, 
probably m the nervous system In laboratory studies of learning, an 
effort is mad(' to keep the stimulus environment constant during the 
learning process, so that changes in liehavior can he ith'utified as changes 
in the organism and not in the external stimulus. In many ordinary 
situations in w'hich learning occurs, w'hat an organism does changes the 
environment. Such a change may in turn cause a n(*w modi* of hehavior 
to appear. We do not wish to describe such hehavior as learned-even 
though it might occur as the result of practice or (*xperience 

This list of experiential factors that Wi* wish to exclude* from our 
definition of learning is already long, hut is not exhaustive. The list should 
be left open to accommodate future developments. With this in mind, 
we revise our definition of learning: 

I.^ariiing is a change in peifonnanee that (x-curs as a icsult of 
experience and is not attributable to maturation, fatigue, motivation, 
changes in the stimulus situation, or to other identifiable nonleaming 
factors. 
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THE CLASSES OF LEARNING 

Because of its diversity, we choose to divide the reiilm of learning 
into three sub-are^s: the learning of skills, the learning df facts, and the 
transfer of responses.' 

This volume deals with the third area of learning, which is primarily 
concerned with the transfer of a response from one stimulus to another, 
the selection of one response from a set of responses which occur naturally 
in a situation, and changes in rate of emission or performance of responses. 
We will treat these problems in order of the amount of variability 
involved. Thus we will begin with instinctive behavior, in which stereo- 
typy and rigidity are characteristic and in which learning is probably a 
negligible factor. Then we will take up imprinting, which involves 
complex patterns of behavior that can be linked to a wide variety of 
stimuli. Imprinting represents a slight increase in variability and intro- 
duces something like learning. Next, early-experience research illustrates 
the dependence of later learning on early lewning. Classical conditioning 
(Chapters 3 and 4) involves more flexibility in both stimuli and responses. 
In conditioning, true learning occurs, but the reejuisite variability is at 
a minimum. Instrumental learning (also discussed in Chapters 3 and 4) 
involves even more flexil)ihtv in Indiavior and is based on reward or 
reinforcement. With operant conditioning (Chapter 5), a special form 
of selective learning, and imitation and modeling (Chapter 6), we will 
reach a maximum of flexibility in behavior. 

The final chapter is a brief introduction to matht*matic models, 
which are of increasing importance in learning theory, and in which the 
full range of variability can be represented. 

'Thf leaniiiig of .skills, t\spe(Mally motor skills, is referred to as perceptual -mot or 
learninf! The phenomena of perceptual-motor learning can he and often are treated 
almost independently of other kinds of le«uninK. Separate treatment is given to the 
learning of perceptual-motor .skills in another volume in this senes (Fitts and Posner, 
Human Pvrformam c, U)67) SiniilarK, human information proces.smg and human v.*rhal 
learning are treated separately, as in this senes (see Mams, Co^nitire Pro( esses, 15:166) 
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A minimum of behiivioral variability and, consequently, a minimum 
role for learning is evident in ♦he three classes of behavior to be con- 
sidered in this chapter. Tlic classes are instinctive behavior, imprinting, 
and early experience (of certain kinds). 

INSTINCTIVE BEHAVIOR 

The term instinctive refers to complex behavior that appears to 
develop without the benefit of ](*arning from prior expi^rience. Instinctive 
behavior is usually stereoty|M*d and thus not ordinarily modified through 
learning. 

The sex life of the three-spined stic’kleback (gasterosteus nculeatus) 
as described by Tinbergen (1952) is a good illustration of instinctive 
beliavior. Th(‘ stickleback a small, common fish found in the shallow 
streams, canals, and ditches of Kurope. Tinbergen ( 1951 ) distinguished 
four large sequential segments of the reproductive behavior of the 
male stickleback. In the spring ( 1 ) the male establishes a territory and 
fights with other male sticklebacks. \l) He then builds a nest, and 
(3) he begins courting and mating behavior. Finally, (4) he develops 
a bro(jd. Each of these fo\ir major phases consists of chains of simpler 
segments of behavior, each initiated and controlled by a limited set of 
highly specific external stimuli, called sign stimuli by Tinbergen. 

The fighting behavior of the stickleback is dirpct(‘d almost exclu- 
sively toward other males with nuptial markings. Since the throat 
and belly of the male turns red during the mating sea.son, Pelkwijk and 
Tinbergen ( 1937) suspected that the red color was the important stimulus 
that elicits fightiiig behavior. To test their hypothesis they constructed 
models of sticklebacks (.see Figure 2.1). Some models (Seru's N) were 
shaped like sticklebacks but were silver and green with no red. Another 
.set of models (Series R) lacked many of the figural characteristics of 
sticklebacks or even fish, but they had a red underside. When both kinds 
of models were presented to the male stickleback, he attacked those with 
red undersides more vigorously than the more acturately formed silver- 
and-green models. In one report, Tinbergen (1952) relates that a red 


5 



6 


CONDITIONING AND INSTRUMENTAL LEARNING 


mail truck driving past the window at a distance of more than 100 yards 
could induce the male to attack the side of the tank vigorously. 








V. 




Figure 2.1 

Models used in tests of hehat tor tn tlu' three- 

spined stivUehack Tht heavily stippled lower por- 
tion of the models in sf'ries R is r(d {Adapted from 
N. Tinbergen, The Study of Instinct, 1951 [Claren- 
don Press, Oxford], by permission.] 

It is obvious that the stickleback was able to perceive the realistic 
silver-and-green model but that his instinctive fighting behavior was 
elicited onlv bv red coloration. Limited se!^ of stimuli w^hich elicit 
complex instinctive behavior are called sign stimuli. Since the stickle- 
back’s fighting behavior is elicited only by sign stimuli, and only when 
the stickleback is in an appropriate physiological state signaled by the 
appeal ance of the red markings in the spring of the year. Tinbergen 
concludes that there must be an innate releasing mechanism that responds 
to sign stimuli and is responsible for the instinctive complex of behavior. 

As Tinbergen ( 1952) demonstrates, the stickleback’s mating behavior 
occurs in a very d(*finite and unalterable sequence in v\ Inch the completion 
of one phase is nec'cssary to provide the sign stimuli for elicitirg the next 
phase. The stickleback does not begin to build his nest until he has 
defended the territory for a while. As the first step in nest building, the 
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Stickleback digs a small hole. If someone fills up the hole, the sticklebcftk 
will dig again. Only jffter many failures will he proceed to build the 
superstructure! without a pit or depression under it. The next phase 
(courting) noiVnally will not occur until the nest-building pha.se is cxiin- 
plcted. If a female entt‘rs the territory hefori' the nest is complete, she 
will be driven away or at best greeted with a few abortive zigzags, 
which are the beginnings of the courting behavior. 

The clear se(|uence of “bits" of behavior is •best seen in the actual 
mating. When all is ready, the appearance of a iemale with an egg- 
swollen belly elicits in the male a characteristic zigzag dance. The dance 
(dicits courting Ixdiavior on the part of the female, which in turn causes 
him to lead her to the nest. He indicates the entranci' to the nest and 
she enters. The male (jiiivers as he nudges the female near her tail. This 
induces her to spawn Tlie pres^MKc of eggs induces the malt* to fertilize 
them According to Tinbergen, each step in this setpience can be .shown 
to depend upon a limited set of sign stimuli, very inut h as the fighting 
btdiavior was shown to depend on a red belly For example, onct* tin* 
Iemale has ent(*red tlie nest, slit* can bt* induct'd to spawn bv stimulation 
with a glass rot! even though she has :een the malt* who led her there 
removed from the tank. 

Tilt' stK'klebat'k’s beliavior is cleailv hit'rarchical as 'rinlx'rgen 
( H)")! ^ ledicatt'd Tin* reproductive instinct, tht* largest unit, is rt'vealod 
in the setjuence ol fighting, nt‘st building, mating, ,md the i. using of the 
lirood Isach of these segnitmts is (timposed ol sinalli*? units in the hit'r 
aicliy. For example, the male’s mating behasior t'onsists ol a zigzag dance, 
leading the female to the nest, .showing her tht* entrance. (|uivt*rmg to 
induce her to spawn, and lertili/iii£| the eggs 

This stilt of instiiutise behasior is complex, inflexible, automatic 
and int'chanical It sho^^s very little variability and no learning. When 
interfei^'d with, instinct <ippcars blind and stupid -as in the case of the 
sticklc*baek who attacked th<‘ mail truck 

IMPRINTING 

.\ .second class of behavior m which varialiility anti li'aming are 
minimal is hnpriutin^ The term applies to the attachmcait of a t'omplex 
behavior pattern to a stimulus wdiich happens to be present at tht* right 
moment. This is in contrast to the link between instinct ivt* behavior 
and highly specific sign stimuli. An example of imprinting is tht* tendency 
of newborn animals or newlv hatchc'd fowl to follow' wh.itever object is 
.seen first. Konrad Lorenz (1935), the Furopeaii naturalist, noted that 
if incubator-hatchc'd ducklings saw' cmly him when they were first 
removed from the incubator, they tended to follow him around very 
much as normal ducklings tend to follow their mothers. 
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• Imprinting manifests nearly all of the characteristics of instinctive 
behavior. However, since imprinting depends to 'some extent upon the 
experience of the organism, learning is involved— although of a special 
limited kind. The ^imitations of the adaptiveness of imj^inting can be 
seen in the way cichlid fish (Cirhlidae) identify their young. Baerends 
and Baerends (1950) report that some of these perch-like fish “learn” 
to confine their parental behavior to the young of their own species from 
th(*ir experienct; the first time they rai.se a brood. This “learning” was 
demonstrated by the following experimental procedures: Using a pair of 
young fish about to hatch their first brood, the experimenters exchanged 
the eggs for those of another species. The parents accepted the brood 
that hatched and raised them as their own. In subsequent seasons, 
however, they killed their own young as .soon as they hatched, and would 
never again raise young of their own species, although they would raise 
a brood of t^e species they had first raised. 

Following Lorenz's report (1935) of imprinting in birds, a great 
many studies were carrit'd oui to test for imprinting in species ranging 
from insects to mammals. The results were generally positive, indicating 
that the mechanism of imprinting is widespread and general. 

Some of the most interesting, extensive, and experimentally precise 
.studies havejieen report<*d by Hess (1958, 1959, 1%4). In most of his 
experiments, he observed ducklings and young chickens in the laboratory 
or under carefully controlled conditions on an experimental farm The 
procedures used in some of the early laboratory experiments (Hess, 1959) 
illustrate the general nature of the experimental methods used. Eggs of 
relatively wild mallard ducks were incubator-hatch(»d. The newly hatched 
birds were placi d in small cardboard^ boxes so they could see very little 
in the dim light. The duckling could be released in the imprinting appa- 
ratus by remote control. The apparatus (see Figure 2.2) consisted of a 



Figure 2.2 

The apparatus used in the study of imprinting. 
(Adapted from Hess, 7959, by permission of the 
author and publisher.) 
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circular runway formed by plexiglas walls; the runway was alx)ut 12 
inches wide, 12.5 feet in circumference, and al>out 5 feet in diameter. 
A mallard dueV decoy was suspended about 2 inches^ alnne the runway. 
The decx)y contained a loudspeaker and a heating unit and could be 
moved around the runway. The movements of both the decov and the 
duckling were recorded automatically, and the diK-kling couVl be 
returned to its box automatically through a trapnloor. 

For the imprinting process, the duckling would hr icleased in tiie 
apparatus about a foot behind the decoy, the sound would be turned on, 
and the decoy would emit a human rendition of "goc'k, goc k, gock, gock 
Shortly afterward, the decoy would begin to move. Tht‘ duckling would 
be permitted to remain in the apparatus a gisen It nglh of tinn* or to 
make a specific number of turns befor»* being retnined to his small box 
to await testing. ^ 

After till' imprinting procedure, two modcK wire placed in the 
apparatus: the decoy model of a male- that had Ix^tn used duiing the 
imprinting proceduri , and a iemale model that ddftaod onK in eoloiation. 
The duckling was released halfway hctwei'ii the models and given a 
minute in which to make a decisive following response to one of the 
silent models. When sound was turned on, the male modi*! emitted the* 
same “gook” .sound that had been used in imprinting, amf the li male 
emitted a recording of the sound of a leal f(*rnale mallard calhuL^ her 
young. Tests were then *riin in which th(* ducklings conkl i ht>os(‘ to 
follow’ one or the other of the models. Diff(*r(*nt Ic'sts mvolvixl various 
combinations of the two models, whieh might be silent oi calling or 
might be stationary or moving. DuriKg the tests most ducklings responded 
to the male model, thus showing imprinting 

One characteristic' t^at distinguishes imprinting from learning is the 
critical period. It is gencaalK thought that the time during wdueh imprint- 
ing can occur is limited to a brief period in the life of the organism. 
Hess earned out the imprinting procedure w'ith groups of ducklings 
from c)ne to 32 hours after hatching. Figure 2.3 shows that those duck- 
lings imprinted w'heii they were 13 to 16 hours old show^ed the greatest 
effectiveness of the procedure. 

The dramatic ri.se and fall of the c-ii, /c in Figure 2.3 invites specu- 
lation. According to Hess (1959), both are attnliutable to the amount 
of “following” behavior that occurred at each age level. The amount of 
following increased with a growing locomotor capacity after hatching, 
and then decreased with a growing capacity for fear and resultant 
unwillingness to follow a strange object. To support his hypothesis, Hess 
kept the length of time the duckling was exposed to the model constant 
in one experiment but varied the distance that the model traveled during 
the imprinting process. The results can be seen in Figure 2.4, where 
the relationship between the imprinting score and the distance traveled 
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Ls clear. In a second study, in which the exposure time was varied but 
the distance was held constant, no difference in effectiver^ss was found. 
In a study with newly hatched White Rock chicks, the diicks displayed 



Figure 2.3 

If an animal made a posit it c rospon^c to the male 
decoy on which it had been imprinted in each of th( 
four test conditions, it was ^iven a score of 100 / kt - 
cent, and imprinting was k yarded as complete The 
figure shoics the percent of animals of each ap,e 
that were completely imprinted according, to this 
criterion (Adapted from Hess 1959, by permission 
of the author and publisher ) 


no fear up to 13 to 16 hours after hatching. After this point, the per- 
centage showing ie.ir increased until all animals showed fear at 33 to 36 
hours after hatching. Hess s evidence supporting a "critical period’ for 
imprinting is fairly persuasiNe. Of the large number of variables studied 
by Hess, several are of special interest or importance. In one set of 
studies (1964), birds were shocked during the imprinting procedure. 
Under most circumstances, shock will produce avoidance behavior. But 
Hess generally obtained more following behavior and stronger imprint- 
ing with shock than without. The maximum effect of shock occurred 
when the birds were 18 hours old. The difference between the disruptive 
effect of shock in learning and the enhancing effect of shock in imprinting 
is one of the differences on w'hich Hess bases the conclusion that learning 
and imprinting are clearly distinguishable. 

In another set of stupes, chicks were given an opporHinity to obtain 
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food by pecking at a green triangle but not at a blue circle for whicli 
they showed a preference before reinfor^ment. When the chicks were 
rewarded or reinforced for pecking at the green triangle for two hoiu^ 



Figure 2.4 

StrenfTth of immintinf!^ as a function of the distance 
traveled by auckhngs, with exposure time held 
constant (Adapted from Hess, 1959, hy permission 
of the author and publisher ) 

on the third day after hatching, something like imprinting, as distin- 
guislied from normal leanaing, occurred. Normally, when such a response 
IS rewafded, or reinforced, and the reinforcement is subsequently 
removed, the response will gradually disappear— it will undergo "experi- 
mental extinction.” (When the training took place before or after the 
third day, the response tended to disappear more or less rapidly when it 
was no longer rewarded.) But when the training occurred on the third 
day, the response was highly resistant to extinction and persisted without 
further reinforc* ment over a number of c jys. Hess cites this difference 
in behavior as further evidence for a difference between imprinting an& 
learning. 

Hess has also examined the effects of drugs on imprinting. Some 
of these tranquilizing drugs either prolonged the period during which 
imprinting could occur (presumably by reducing the fear response) or 
prevented imprinting from occurring (presumably by reducing the 
capacity for effortful muscular response [Hess, 1959] ). In studies of 
the reinforcement of pecking the green triangle, drugs were used which 
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either had not affected normal learning or else had enhanced it. These 
drugs prevented the imprinting of the pecking* response to the green 
triangle on the third day (Hess» 1964). 

These imprinting studies were selected to illustrate several points 
concerning both the general character of the imprinting process and 
the differences between imprinting and learning. Lorenz ( 1935) regarded 
imprinting as the establishment of an emotional tie between the young 
and the parent. Whil^ many of the studies of imprinting have dealt 
with the attachment of following behavior to some stimulus pattern, 
usually that of one of the parents, not all imprinting is ol this type. The 
imprinting of the pecking lichavior of chicks on th(' stimulus pattern 
a.ssociatcd with reward makes c lear that the prr>cess of imprinting is a 
.somewhat more general mechanism in animal bc'havior. 

Imprinting is the association of a complex pattern of behavior with 
a stimulus .complex or pattern that happens to lx* present during the 
first or early occurrences of the behavior. Thus, it is behavior that varies 
with the particular experience of the organism and, in a sense, is a 
primitive form of learning. 

However, as Hess (1964) has pointed out, imprinting has a numbtr 
of special characteristics that .set it apart from most learned behavior: 
(1) Imprint;mg can occur only during a limited or critical period. (2) Im- 
printing is prevented by certain drugs, such as trancjuili/cTs. (3) Imprint- 
ing is most effective when the training expv‘rience is confined to a 
relatively short space of time and is thus inasscxl rather than spaced 
(see He.ss, 1964). (41 The behavior to be imprintf*d and lh(‘ imprinting 
itself are both accelerated by a noxious stimulus applied during the 
imprinting process. (5) In imprinting, the firs:t experience with a situation 
is most important. As we shall see siibseqiumtly, more complex forms 
of learned behavior tend to exhibit characterisVics quite diffcr^mt from 
those of imprinting 

EARLY-EXPERIENCE STUDIES 

The earliest experiences m the life of an individual organism are 
sometimes given overriding importance in theories of personality and 
ji^sychological development. (Examples will be found in McNeiL The 
Concept of Unman Development, 1966, and Blum, Psijchodynamics: 
The Science of Unconscious Mental Forces, 1966, both in this series.) 
The designation early-expcrience studies is used to refer to studies dem- 
onstrating the necessity of certain kinds of early experience for the nor- 
mal development of an organism. The absence of such early experiences 
and opportunities for learning appears to restrict the capacity for later 
learning and thus to prevent normal development. 

The most explicit characterization of the differences between early 
and late learning were developed by Hebb ( 1949 ) . He hypothesized two 
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kindvS of neurophysiological mechanisms underlying the two kinds of 
learning. The earliest learning, in H^hh’s language, consists of the 
establishment cell assemblies, the most fundamental building blocks 
of behavior. When a child learns to identify an obj(*Ci, this identification 
involves a specific neural structure- or cell assembly— which bt'comes 
organized during his repeated experiences with the objec t. In 'sharp 
contrast to imprinting, the formation of cell assemblies is assumed to lake 
plac'o slowly and to involve repeat(‘d t‘xpt‘ri(Mic‘i*. Later learning, accord- 
ing to Hebb, c'onsists largely m th(‘ formation of phase se(fnei\ces—'<\s 
cell assemblies are chained together to produce complex nu'aningful 
behavior. Thus c(‘ll assemblic's might underlie our understanding of 
individual words, while the C'haining of \cords into meaningful langn.ige 
might lepre.sent the formation of pha^e secjnencc's. In contrast to the 
slow, early k'arnmg and reLitive stal)ilit% of c'c'll assc*mbli(‘s, phase 
secjueiu'es are assumed to be IcMined later and mor<‘ rapidlv .md to be 
easily rnoditied. 'Hk' concc'pts (ell assenihbf and /)/if/se siquetur are 
basic to the <‘arly-(‘xperieiK'e studies stimuht(‘d bv llebb's work 

This is the line of reasoning that supports (Mrlv-expiMuaicv studies 
If an animal is dc'prived of some of the nonnal childhood (or puppy 
hood) experieiKvs that norrnalh result in the formation of the ii‘(|uisitc* 
cell assembluvs, later lc*ariiing, since it consists of the cfiaifnng of ct 11 
as.stmiblics into phase se(|ucnccs, will be distort(‘d oi impossible It is 
thus clear why the (‘arlit'^rt expcTit'ncc's of the organism an* considiT(‘d 
to be of overriding importance. 

In ciirly-expenenc (* studies, either animals ait‘ deprived of C‘ertain 
normal early expcTiences or the clmracter of the l arlv exp(‘ru‘nc'e is 
rigidly controlled. Efforts are then made to det(*iinine the effi'cts of tiu^se 
abnormal early ccMiditums on the Liter b(*havior of the' animals. ()m* of 
the most^ comprehensive of these* studies, involving the isolation of Scott ie 
dogs, wa.s carried out in the laboratone.s at McCull Ihiivcrsity Among 
numerous reports published on these studies are those of Mel/ack and 
T. H. Scott ( 1957) and Thompson and Mel/ack ( 1956) J. I^. Scott ( 1962) 
made a general review of critical periods m behavior develo|)ment. 

The McGill Scotties were isolat(*d after they were weaned at about 
four weeks of age. They remained isolated in cagc‘s until they were 
about eight months old and were essentially adult. The* ten isolateck 
dogs cx)uld not sec out from their cages although light was aclmitted from 
above. The cages had two compartments with a sliding door h(‘tween 
to permit provisions of food and water and for cleaning the compartments. 
Melzack and Scott (1957) compared the behavior of these t(‘n isolatc'd 
dogs with that of twelve littermates w'ho were raised as pets in homes 
or had a “normal” or “unrestricted” rearing in the laboratory. After the 
isolated dogs were released from confinement, they wf*re exceptionally 
active and playful. According to Thompson and Melzack ( 1956 ) the 
isolated dogs show^ed a “puppy-like exuberance” that seemed strange? in 
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an animal that appeared to be physically adult. Thompson and Melzack 
also report more formal tests of exploratory behavior. Several normal 
and several restricted dogs were tested individually in a/'small room for 
30 minutes on each of four days. The normal dogs soon became bored 
with the monotony of the room and quietly relaxed. The restricted dogs 
coutirtued to explore lor a “considerably longer time.” Similar results were 
obtained in a <^eries of four 10-minute tests in a maze where the amount 
of activity of the dogs could be easily quantified. 

The reactions of the d\)gs were also observed in the presence of a 
series of strange objects such as a human skull, a slowlv filling balloon, 
and an open umbrella. The nonnally-reared Scotties ran awav from such 
ob|c*(’ts. The isolated Scotties “became highly agitated, jumped back and 
forth near the object, whirled around it, stalked it. . . T^eir behavior 
was “diffuse” or “iindiflerentiated ” When the isolated dogs were tested 
.igain a year later, their behavior was somewhat similar to the original 
avoidance behavior ol normal dogs The normal dogs, on the other hand, 
now “attacked the obj(‘cts with playful aggression,” rather than show- 
ing f(‘ar. 

In other tests, the isolated dogs shovv(‘d “mark(‘d deficiencies in 
learning and problem solving” ( Hebb, 1958). In one such test (Thompson 
and Heron,* 1951, Thompson and Mel/ack, 1956; the dogs were trained 
to find food by running along a wall from one corner of a room to the 
next, 'rhe food was then pl.iced in another cMrn(‘r of the room in full 
view of the dog, and the p.m was banged on the floor. The' normal dogs 
usuallv ran straight to the new position. The restricted dogs were much 
less efficK'iit, often running first to;^vard the corner where the* food had 
becMi previously. 

The dogs were also presented with a classic test of aniin.d “intelli- 
gence,” the detour problem Fewd was placed l^ehind a wire sc^reen To 
reach the' food, an animal had to “detour” around the scrc'en. Normal 
dogs were reported to solve this problc^m in one or two trials, the isolated 
dogs sjient much time in front of the food, pawing the screen, trying to 
push thcMi inir//les through, or otherwise trying vainly to get the food 
th.it’ was bcdiind the wire. The restricted dogs were described as dis- 
playing “strikinglv unintelligent behavior.” 

()n(‘ further study involved a series of 18 maze problems that tested 
a wide' varietv' of abilities. This study might be considered as a test of 
animal intelligence. All of the dogs were given preliminary training in 
ma/e running through simpler mazes, so that they were proficient in 
performing in mazes of this type. The restricted animals were markedly 
inferior in their performance on the 18 more difficult mazes. Thompson 
and Melzack (1956) report that animals that had been out ot the re- 
stricted environment for several years were still inferior in perform- 
ance to the normal dog, indicating that the retardation imposed was 
more or less pennanent. 
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Probably the most dramatic effect of the isolation of the dogs is 
revealed in tests for the reaction to or perception ol pain. Two studios 
involved the reaAion to an electric shock. In one study, the Scotties were 
placed in an enclosure that measured 3 feet by 6 feet.*Thev wcic tlien 
pursued by a small, remote-control, toy automobile which delivered a 
.strong electric shock on contact. The restricted dogs did not really l^arn 
to avoid the car and received more than four tiin^gs as inanv shocks on 
the average as the normal dogs. Two restricted animals tested two years 
after release received nine and 23 shocks respectively when the average 
number of shocks received by normal dogs was six. In the second study 
involving shock, the enclosure was divided in the middle by a 3-inch 
barrier. The grid floor could be charged to delivtT a strong shock. The 
dogs were placed individually on the sid(‘ th(*y preferrc'd. The experi- 
menters then made a rule; After one minute , .i s(‘cond of strong shock 
would be delivered through the floor of that side However, if t\ie Scottie 
jumped lht‘ barrier during the minute, it could escape the shock After 
an escape, the animal was replaced on the* shoc k side Tc'ii t(‘sts wctc 
given each day for three days for a total of .30 tnals. No further tests wcae 
run if an animal avoided the shock five trials in a row. Ten of tlu* twelve* 
normal dogs reached this criterion, while only two ol tlu' niiu* r('slnct(‘(l 
animals did so Tests on two restricU^d dogs two y‘*ars late^ s(*(*m to 
indicate that the deficienc y persisted ovc'r that length ol linn riu* authors 
.ittributc the dilfcTeiucs in'^the bcdiavior of the* two groups of dogs to 
differences in pain perception Tluw rc'poit that tlu* noimal and ri'stncted 
dogs did not differ either in the threshold for shock or in llu‘ir leac tions 
to minimum values of shock. The string shocks us(*(l in the l(‘sts W(‘re 
obviously painful to the normal dogs 

I'he conclusion that re.inng in the isolat(*d environment has affected 
the capaoity to perceive pain was tested further bv means ol two kinds 
of tests. In one, a lighted match was brought near the animal’s nose*. 
Ncmnal dogs either will avoid the flame immediately or will cjuickly 
lt‘arn to do so. Seven of ten restricted dogs made* no attempt to get avsav 
from the nose-burning match. Thc*y moved their noses into tlu* flame as 
soon as it was presented. Tliey then jerked tlu'ir heads or whole bodies 
away “a.s though reflexivcly.” But they retimed to the flame, hovered 
excitedly rather than retreating, and thren^ of the animals sniff<*d at the, 
flame as long as it was presented A similar test consist(‘d in )<d:)bing the* 
animals with a dissecting needle in the skin at the* sides of the hind 
thighs. After this experience, the restricted dogs spent more time near 
the experimenter than before and generally behavc‘d as if thc'y wc^rc* 
“unaware that they were being stimulated hij something in the envi- 
7 onment” 

The authors conclude that the lack of normal early perceptual 
experience had influenced, at least in part, the development of certain 
normal overt rej^onses such as avoidance of noxious stimuli and had 
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prevented the normal development of the capiicity to perceive normal 
pain. They relate an anecdote concerning one of the restricted dogs. 
There was a low pipe in the laboratory room, and one iiog was ol)served 
to hit his head on the pipe 30 limes within an hour. Normal dogs would 
not and did not perform in this manner. Such behavior could be attributed 
to a lailure to perceive normal pain, a failure to react to the pain with 
avoidance', or sherr stupidity in failing to remember the earlier experience. 
Tliat the /estricted dogs failed to perceive pain cannot be established, 
given the sulijecMve nature of the expencnce of pain, but the possibility 
(‘annot lu' rejected lor tlu* same reason. 

Whatever tlu' nature of the restricted dogs’ deficiencies in situations 
one would normally regard as painful, th(*r(‘ is no doubt that the early 
isolation of these Scottii' dogs liad profoundly affected the character of 
their later development. P^irthennore, although the evuh'nce is not whollv 
adequate, it suggests that equality with the normal liltermates was never 
achieved. Tluis the depiivation of normal stimulation during the devel- 
opment phase produced an animal that was deficient in intelligence, in 
l(*arning capacity, in the normal tendency to exhaust curiosity, in emo- 
tionality, and possibly in the capacity to perceive pain. In the words of 
Thompson and Mel/ack (1956), thi' early restiic'tion has produced 
animals th*at “may remain forever immature.” 

SUMMARY 

It is the belic'l of manv students of animal behavior tliat there are 
extremely ccunpluMted patterns of jK'havior which mav be d*\scribed as 
inslinctivc and which do not involve learning. Such patterns are presumed 
to be built into the neurosen.sory structure of the organism, and to be 
made ready for activation at an appropriate stage of maturit\' or physio- 
logical condition. Ethologists refer to this readiness as an innate releasing 
mechanism which is activated in the pre.sence of a fixed stimulus pattern, 
a sign stinmlus Instinctive behavior is stereotyped, rigid, and is subject 
to littU* or no modification through experience. 

Imprinting invohe.s the association of a complex pattern of behavior 
with whatever stimulus pattern is present the first time the behavior 
pattern appears in the development of the organism. Tlie complex 
behasior patterns w^hich can be imprinted include following, brood- 
raising, and pecking at a given stimulus for food Usually, imprinting 
can occur only during a critical period. This and many other character- 
istics distingui.sh imprinting from ordinary learning. 

Finally, there seems little doubt that the early experience of the 
individual organism is especially important in its development. Depriva- 
tion of normal stimulation during the early phases of growth produces 
an immature organism whose capacity to learn seems permanently 
distorted and reduced. 
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The simplest forms of true learning are contlitionin^ aiul instrumental 
learning. They are fjuite general -in contrast to imprinting, they do not 
appear to be limited to particular oiganisms, stimuli, or responses. 
Because conditioning and instrumental learning are so simple and gen- 
eral, theorists have held them to be models that reveal the basic nature 
of all learning. Howevei their relative merit .is paradigms has been and 
continues to be a controversial issiu*. 

CONDITIONING AND INSTRUMFNTAL LEARNING 
AS PARADIGMS 

CLASSICAL CONDITIONING 

Conditioning is a form l<*arning in which the capaciiy to eiicu a 
response is transferred from one stimulus to another. TIu‘ exampl(‘ of 
Pavlov s salivating dog is often cited because it was one of the first formal 
conditioning experiments. Pavlov ( 1927 ) taught one of his laboratory 
animals to salivate at the sound of a*beating metronome by injecting 
meat powder into the dog’s mouth after each o* .j number of presentations 
of the sound. Pavlov’s couditioning procedures ar(* termed “classical” 
because o^ their historic significance in psychology. P(‘rhaps one reason 
for the prominence of this particular example of conditioning is that most 
of us find ourselves similarly conditioned to a variety of sights and 
.sounds— our mouths water at the sight, smell, or even thought of a 
favored food. 

Tlie terms that describe conditioning can be illustrated by Pavlov’s 
experiment. The meat powder that produc(‘d salivation without training 
was an unconditioned stimulus (VS) producing an unconditioned re- 
sponse (UR). After the conditioning procedure, the metronome sound 
that produced salivation was a conditioned stimulus (CS) producing a 
conditioned response (CR), (Before conditioning, the metronome sound 
did not produce the relevant response and was therefore a neutral 
stimulus.) These simple relationships are diagramed in Figure 3.1. 

In conditioning, the CR sometimes appears to be identical with the 
VR, possibly differing only in amount. However, in most circumstances, 
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Did^iams of th<' vlasural-cotulitionin^ and tusfru- 
itu ntal-lcaimuf^ paiadiinns trhi( h ^ii r rise to sub- 
sitt niton and rcinforcrmrni models for Jratnituj^ In 
the classical conditioning dia^iam, CS stands for 
conditioned stiwidus, US, unconditioned stimulus, 
UR, unconditioned r'\sponse, C’R, conditioned re- 
sponse In the instrument Icarninf^ diagram, SS .stands 
for stimulus situation, R for available response. 


the two rt'sponses are clearly Lhllereiit. Kimble (1961, pp. 52-59) says 
that the two most commonly held \iews ot the relation between the VR 
and CR is that the C'R is either .i Iractional component of the UR or that 
it IS a preparatory response in anticipatioii ol the US 

It IS clear that classical conditioning inyoKes substantially more 
yariability of behayior than do(*s instinct or imprinting, both of w'hich 
are characterized by rigidity, inflexibility, and lack of major variation. 
(Conditioning is generally regarded as the simplest form of true learning. 

INSTRUMENTAL LEARNING 

The paradigm of in.strumental learning is also diagramed in Figure 
3.1. It is the essence of instrumental learning that a response is instru- 
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mental in the achievement of a goal. Suppose a hungn’ orgamsni is 
offered a choice of four responses which are in all rt'spects (save one) 
equally attractive. The four responses are perforin(*d and oiu* ol the four 
(in the diagram, RJ leads to food On sul)se(|ut‘nt occ'^sions, the response 
that leads to food will he selected and will tend to h(‘ dominant wheiu'ver 
the organism is hiingrv and is in the same “stinuilus sitnation’\( SS ). 
This form of learning is called instrumental Icarnintr hecause ont' of tht‘ 
responses is instrumental in goal achiev(*ni(‘nt. it* is also called selective 
learning hecausc the rewardtnl response com(‘s, thrtiugh li*arning. to he 
selected at the expense of the responses which do not U‘ad to X\\v reward 

Instrumental leaining involves more vari.ihilitv' ol h(‘havior and a 
wider range of responses than chissieal (onditionmg docs. \Mit‘n laeeil 
with a number ol ecjualh attraetiv(‘ alteinativ'cs, an organism tends to 
alternate h(‘tw(‘en rt‘spons(‘s (Thus, in silu.ition diagramt'd in 

Figure 3 1, if the four alternatives aie (‘(jn.dlv likidy .it theoulset, thev 
would each have a prohahility ol o<‘cunenc(‘ ol 2'). 11 the oiganisin 
ihooses R, on the first trial, tlie prohahilitv that R| will occur on the 
second trial is considcMahly reduci'd, and the' prohahilitu s ol the other 
responses occurring on the second inal is iais(‘d ac'cordinglv Such 
‘alternation phenomena” oc'cin with high rc'li.ihihty in most mstrumenlal- 
learnmg situations. ) This tcaidcaicy to v.uv ?espons(*s ratluT tj^an to repeat 
them produces more r ipicl exploration ol the (‘iivironment than would 
occur hy chance 

OTHER PROCEDURES 

Several otlu*r procedures aie Irecjuentlv (‘inployed, although none ol 
them has gamed the theoretical iinjjRiitance ol classical conditioning and 
instrumental learning ol the* r(‘sponse-sel<*c'tion tv|)(‘ 

Avoidanc'e conditioning is similar to classical conditioning (*.\cept 
tfiat pcfformance ol the resj>onsc’ m avoidanc'c conditioning prevents the 
occurrence of the f'S. For example*, it a C,’S tone is lolloweci, mescapahly, 
hy the presentation of a shock to the loot ol a dog, classical conditioning 
will occur In avoidance-conditioning procedure, the a|q)aratus is arrangi’d 
so that the raising of the foot prevents the* shock. (Conditioning nncle*r 
classical and avoidance procedure-s may procc*e*d quite dillc’re’iitly, and 
there is considerable dcdiate concerning the* rf*lationship ol the t\\^) 
procedures. • 

“Escape Icniriung” usually involvc*s the application ol a noxious I S 
alone in a situation in which the pe*riormance e)l the rc‘sponse leads to 
escape. In avoidance conditioning, the* interval between the C\S incl the* 
VS is usually long enough to permit the respon.sc* to occur helore the FS 
would he applied. If that interval is reduced to the* point at which the, 
CS and f/S appear .simultane'ously, avoidance is impossible hecause th(‘rc 
is no warning signal, and avoidance conditioning hc*comes identical with 
escape learning. 
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“Temporal conditioning” is another procedure in which a manipu- 
lated CS is omitted. The US, such as meat powder, is injected into the 
dog's mouth without warning, hut on a regular temporal s 'hedule. Under 
these conditions, sidivation will tend to occur in a temporal rhythm, in 
the absence of the US. 

Temporal conditioning has been combined with instrumental learning 
by arrangiiig a situation in which an instrumental response can prevent 
the occurren:^e of a noxious US. Thus a rat may be shocked at regular 
intervals, such as once a minute, unless he presses a bar within the 
interval. The apparatus may be arranged so that pressing the bar post- 
pones th(‘ next shock for some fixed interval, perhaps a minute. 

There are numerous other procedural variations, many of which are 
discus.sed by Kimble (1961, Chapter 3). 

THE GENERALITY OF CONDITIONING 
AND INSTRUMENTAL LEARNING 

Jt IS widely believed that any physical energy change that an 
organism c‘an respond to in any observable fashion can serve as a CS. 
Ther(* appears to be no evidence c‘ontrary to this generalization. 

There also .seems to be very little limitation on the character of 
responses that can be conditioned or used as instrumental responses. A 
great many skeletal movements have been used as UR. Conditioning 
has be('n successful with involuntary skeh'tal responses such as the knee 
jerk in respun.se to a tap on the patellar tendon. Many types of auto- 
nomic responses, such as the gaHanic skin response and vasomotor 
reactions, have been successfully conditioned. Some difficulty is occa- 
sionally encountered in the conditioning of simple reflexes such as the 
patellar reflex and the pupillary reflex (Kimble, 1961). In the period of 
the past 45 years, some controversy has existed concerning the condition- 
ability of the pupillary reflex to light and to electric shock. Kimble 
( 1961 ) lists citations of eleven succe.s.sful and nine unsuccessful attempts, 
and suggests that success seems more probable sxhen shock is used as 
the '(^S, Physiological responses such as gastric secretions in respon.se 
to food, nausea and vomiting induced by morphine and immunity 
reactions induced by injections of toxins and antigens have been condi- 
tioned. Furthermore, conditioning has been carried out succe.ssfully 
when the CS was direct stimulation of the brain, thus bypassing normal 
receptor organs, and in. which- the UR was the blocking of the “alpha 
rhythm” in the cortex, thus bypassing any normal effector activity. (The 
alpha rhythm is a characteristic frequency of the electroencephalogram. ) 

Age appears to be a factor in the ease of conditioning, although there 
appears to be no age (for which there are suitable CSs and USs) at 
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which conditioning cannot occur. Conditioning lias been reported in 
chick embrvos in the fifteenth day of incubation ( Hunt, 1949 ) and in the 
human fetus ini the seventh and ninth months of pregnancy (Spelt, 
1948). The sucking reaction of newborn iniants has be^n conditioned to 
the sound of a buzzer bv; Marcpiis (1931). A reduc'ed potentiality for 
conditioning in older people has been reported by Braun and (ieiselhart 
(1959). 

While there is little ((uestion conccTiiing the' c'onditionability c^t higher 
organisms, controversy does occur in tlu‘ iiUerpretation of results of 
experiments demonstrating the elasvical conditioning o( very simple, 
primitive, invertebrate organisms. Thompson and Mcfainnell (1955) 
report the classical conditioning of planaria, a simple fiatworm. Subsc*- 
(jiient investigators have reportc^d mix(*(l success, but there is little doubt 
that classical conditioning extends well into the invertebrate world 
(Jacob.son, 1963, McCamnell, 1966). 

CONDITIONING AND 'TRUE ASSOCIATION" 

C]onditioriing is presumed to involve the establishment of an c/vso- 
riation betwe^cm tlie CS and c‘itruT the ( S or th(‘ VR. I’o mak(‘ cvrtain 
that such an association has lu'cn truly establislu'd, it is necessary to 
distinguish other forms of cllecls that c*an occur whem the CS is followed 
b\' the f’S. Although it has not oftcai Ix'cn t(‘sted or rc^porlcxl, the 
occurrence of the CS immediatelv luTore the ( S can iufhieuc<‘ tlu‘ c har 
acter and intensitv of the rc‘sponse to the* f^S T’he edfex t c*an be either 
an enhancing cdfect as one would exp|;ct il the ( S produces an incrc'asc* 
ill th<‘ alcTtness of the subject, cji an inhibiting eflect if the f.’S produc'cs 
an inappropriate diversion of attention Such intrractum effects can 
occui on jhe very first presentation 

Another form cjf change in the rc*sponse that can occur during 
conditioning is a(laj)tation to the stimuli involved, chicdly to the US. 
This effect can be dc’incmstrated by presenting tlic* US alone a number 
of times. Suppose a dog is placed in tlic* stock and harness of the typical 
conditioning experiment and is shockc^d cm the left torepaw a numbcir of 
times. The initial reaction may be a violetu and gcaieralr/ed threshing 
about. As the animal is shocked rc‘peat('dly, he w'ill gradually become* 
less excited, many rcssponses of a gross bodily chaiactcM -that serve no 
function in esc aping the shock -will disappear, and th(‘ rc'spoiKse of lifting 
the foot will tend to become more prc'cise and dclibeiate. None of the.se 
changes in the response to the shock is generally rcjgarded as 
associdtional. 

Two other effects noted in classical conditioning that are sometimes 
regarded as nonassociational are iemitizaiion and pseudoconditioning. 
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The augmentation of the response to the CS through the conditioning 
proct‘dure is called sensitization. On the other hand, presentation of the 
VS can pr(xluce a general sensitization of the organisn^ to any stimulus 
with the result that a response that looks somewhat like the CR can 
occur on the first presentation of the CS. When such an effect occurs 
during the course of conditioning, it is referred to as pseudoconditioning. 

Tliere are generally two ways of discriminating between pseudo- 
conclitioned responses and true conditioning. In most conditioning experi- 
ments. the time interval .between the CS and CS is sufficiently long 
(e.g., .5 second) that the CR, when it appears, tends to have a substantial 
latency- to occur at first only slightly in anticipation of the VS. A pseudo- 
conditioned response, presumably arising from sensitization during con- 
ditioning, tends to follow a CS almost immediately. It is frequently 
possible to separate the responses into two distinct categories on the basis 
of the latency and to show that pseiidoconditioned responses tend to 
decline in number during training while true conditioned responses, with 
their longer latencies, are increasing in number. The second way of 
distinguishing lx*tween pseudoconditioned responses and true conditioned 
responses is by m(*ans of appropriate control groups in the design of the 
conditioned experiment Foi example, if one group has 50 paired 
presentatic ns of tlie CS and I 'S, a control group might have a random 
nuxluie of 50 pre.sentations ol the C’S and 50 presentations of the VS. 
Any responses occurring to the C'S in control group could be 
attributed to pseudoconchtioning, and the number and course of such 
responses could be compared with the number and course of responses 
by the experimental group. 

However, not e\'(Tyone agrees that pseudoconditioning is truly 
pseudo and nonassociational. The CS and VS in an experimental situation 
tend to have much in common that is not shared by other stimuli. Both 
usually occur in a sound-resistant loom with the organi.sm protected from 
any source of uncontrnlli*d stimulation Thus both appear and disappear 
in the same surroundings with sharp onset, sharp offset, and contrclled 
duration. When a response that appears similar to a CR occurs in 
response to the CS without prior pairing with the CS, that response 
could result from a true association based upon the common character- 
istics of the two stimuli. This interpretation would explain why the 
associat(*d response would have a short latency— would occur almost 
immediately- because the association between sudden onset, for example, 
and a shock is one of nearly absolute contiguity. 

Another phenomenon, conditioned suppression of the VR, has also 
been noted occasionally. After a number of pairings of the CS and US, 
the VR may be observed to be diminished in both amplitude and dura- 
tion. The interjection of a trial in which the CS is omitted may then result 
in a restoration of the original vigor of the VR. This phenomenon is 



CONDITIONING AND INSTRUMENTAL LEARNING 


23 


clearly associational, since it is a product of paired presentation, and has 
been attributed to Pavlovian inhibition of delay by Kimble and Ost 
(1961). Inhibition pf delay refers to response inhibition in the early part 
of a long CS-US interval. 

IS LEARNING A UNITARY PROCESS? 

If learning is a single unitary proc'ess that eaif be measured in a 
number of different ways, various measures of learning should l>e highly 
correlated. Tlicre arc a number of experimental studies in which different 
measures of learning were obtained in the same situation. A sample of 
correlations between these learning measures is contained in Table 3.1. 
In general, such correlations tend to be low. The highest correlations 

Table 3.1 

Corrrhiliom hptwrrn nicasurt s of Irarrnnf^ 


Authois 

C AMPBKLL 

niLCARl) 

(1936) 

( AMPBEl 1. 

(1938) 

KtLIXK.(. BlUM.niN 

WALKF.B (1949) 

(1938) 

Response 

KYE BUNK 

0 

KNEE |EHK 

EE(. FLEXION 1 L(. FI KXION 

Organism 

MAN 

MAN 

rXH. IKK, 

rrecjiienc) and 
Ainplitiide 

63 

Ti3 

94 

Latencs .iiul 
Amplitude ^ 

15 

-27 

- 22 

Frequency and 
Latency 

-54 

-27 

- 18 

Freqiiencx and 
Resistance to 
F.xtmclion 


.60 


Trials to Learn and 
Trials to Extinguish 
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tend to occur between the frequency and the amplitude of the response 
where the criterion for the occurrence of a response is usually stated in 
terms of a minimum amplitude, and a substantial correlation is guar- 
anteed. The low correlations could be attributed to errors in measurement 
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and thus low reliabilities of the measures, (ff one could ask whether 
different measures might actually be measuring different things. 

It has been suggested ( 1 ) that amplitude and latency might be most 
appropriate to rfutonomic conditioning and (2) that other measures might 
be more appropriate to the learning of overt .movements. This possibility 
receives some support from the fact that certain variables tend to affect 
some measure's and not others. For example, Hillman, Hunter, and 
Kimble' (1953) report that running speed in a maze is influenced by 
diflerf'nces in motivation^ while the number of e'rrors is not. Estes ( 19*44) 
reported that the rate of responding in a le*ver-pressing task is reduced 
by punishment, whiles the numbc'r of respoascs is not. Kimble" (1961) has 
discussed this problc’in cxte*nsively— but no simple solution has emerged. 

The role of volition in conditioning is also relevant to our discussion. 
Tins issue* is not e'asily resolved, although there have been a number of 
expcrimcvntal investigations. Typically, the role ol volition has been 
examined in studies of human eye-blink conditioning. Two examples are 
a study hy Hilgard and Humphreys ( 1938) and one by Gormezano and 
Moore (1962). In both .studies, variation in instructions to the subjects 
permitted or discouraged voluntdry eflort. In both cases, substantial 
conditioning cx'curred under “involuntary” instructions (suc'h as not to 
let voluittary effort influenci' tin* response), but the various measures of 
strength of c'onditioinng all showed greater learning after instructions 
that c'lKOiiraged voluntary eflort (such as\i suggestion that the subject 
blink so as to avoid the puff ol air to tlu* cornea). 

Two additional fnctcus should be mentioned The pc'rformance of a 
respon.se is cletermmed m part bj, the .vf/engt/i of learning and in part by 
situational factors. The greater the effect of situational factors in per- 
formance, the smaller the expected corredation between different measures 
of learning, since dillerent measures might be affected differently It is 
also true that both amplitude and latency can be, and often are, learned 
to some specific value- In avoidance conditioning, the animal learns to 
respond fast enough to a\oid the shock, and in lever-prcssing situations, 
the animal learns to prc'ss just hard enough to activate the mechanism. 
Such criteria, when operating in a situation, woulrl retiuce* the correlations 
between either amplitude or latency and any other measure of learning. 

Whether learning is a unitarv piocess remains an open question. 
However, a variety of reactions to the problem have been adopted by 
psychologists working in the field. Skinner ( 1938, 1953, 1961 ) ha^ selected 
the number ot responses and the rate of responding as measures of 
learning and lias chosen to ignore other indices Tolman ( 1932 ) cho.se to 
deal only with the percentage of choice.s distributed among alternatives. 
Hull [ 1943, 1952 ) attempted to develop a complex theoretical formulation 
that made amplitude, latency, probability of response, and trials to 
extinction products of a single theoretical variable which he called 
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“excitatory potential." Low correlations were accounted for by introducing 
a threshold for response and a concept of variability that affected each 
measure independently. 

For the purposes of this book,, the question of the imitary character 
of learning must remain unresolved. 

SOME ISSUES OF LEARNING THEORY 

CONTIGUITY AND REINFORCEMENT AS 

PRINCIPLES OF LEARNING 

Let us pose a (juestion that is fundamental to learning theory. Is 
contiguity of stimulus and respon.se a sufficient condition for learning, or 
does learning requirt* reinforcement? 

We have seen that the temporal contiguity of a CS and a US is 
important in conditioning. Some theorists rely on conditioning as»a model 
of learning and therefore stress a primary role fur continguity in all 
learning. 

However, an additional principle seems needed to explain selective 
(instrumental) learning. In the case diagramed in Figure 3.1, the con- 
tiguity of the stimulus situation (SS) with each of the four available 
responses may have been the .same. Yet, in the course of the experiment, 
one r(»spon.se came to be performed in preferen(‘(' to the other three. 
It is obvious that the food reward distinguished the response that was 
selected. This reward appears to have reinforced the response— to have 
strengthened the association of the .stimulus situation with the respon.se. 

There are three main approaches to the problem of C'ontiguily v(*rsu.s 
reinforcement. ( 1 ) One can choose tlrC‘ principle of contiguity as the 
.simplest and most parsimonious way to explain learning. Having made 
this choice, one must somehow explain selective learning without refer- 
ence to the’principle of reinforcement. This approach is sufficiently attrac- 
tive that a number of learning theorists have chosen to construct theories 
which eliminate reinforcement as a necessary condition of learning or 
which assign it a minor and limited role. (2) One can be a two-factor 
theorist and assert that, for some kind.s of learning, both contiguity and 
reinforcement are necessary. (3) One can t.ake the position that the 
principle of reinforcement underlies all lejrning. Having taken this 
position, one must show the role of reinforcement in classical conditioning. 

It is relatively easy to account for the phemomena of classical con- 
ditioning with the principle of reinforcement. In nearly every classical- 
conditioning situation, some incentive substance, such as meat powder, 
is present— or there is an aversive stimulus such as shock, the reduction 
of which could serve a reinforcing role. Since reinforcement always seem.s 
to be present in learning, the question of whether or not it is required for 
learning would seem to be moot. 
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COGNITIVE AND 8-R THEORIES 

Another fundamental issue in learning theory is the relative merit 
of cognitive and stimulus-response theories. A strict 'S-R theory would 
limit experimenters to explicit and measurable stimuli and responses 
(such as grams of meat powder and drop^ of salivation). A cognitive 
theory, on the other hand, might speak freely of variables like “expec- 
tancy.” In the field^ of learning, all theories have tended to deal with 
objective-stimulus characteristics and measurable-response properties, but 
their treatment of variables that intervene between stimulus and response 
has varied. 

According to what might be called strict behaviorism, all learning 
and perhaps all behavior should be described in terms of overt stimuli 
and responses. (The appeal of conditioning as the basis for all learning 
stems largely from the fact that all its terms are overt, directly meas- 
ivable, *;and therefore wholly objective.) Although there is a certain 
amount of appeal in such an “atheoretical,” radical empiricism, almost 
insurmountable difficulties arise in attempting to apply it. If one follows 
this approach (sometimes called the “empty organism” approach), one 
cannot make a distinction between learning and performance— a distinc- 
tion most theorists find necessary. A simple illustration can be drawn 
from th^ example of salivary c'onditioning in Pavlov s laboratory. The 
occurrence of the CR after conditioning depended on three conditions. 
Salivation occurred ( 1 ) to the sound of the metronome, ( 2 ) in the 
experimental room, (3) when the dog was hungry. The sound of the 
metronome in the living quarters did not produce salivation even when 
the dog was hungry. If the do^was not hungry, he did not salivate to 
the sound of the metronome in the experimental room. In these two 
situations, it is clear that even though the salivary response had been 
learned, it was not performed. Tbus, the evidence suggest^ that non- 
performaiMX' cannot be interpreted to mean that no learning has occurred, 
although strict behaviorism would apparently require this interpretation. 

The major difference between what we shall call theoretical beha- 
viorism and a cognitive approach is the character of the intervening 
variables they employ. Theories that adhere t(/ the behavioristic tradi- 
tion— and use concep*^s derived from empirical stimulus-and-response 
research— are frequently referred to as S-R theories. Cognitive theories, 
on the other hand, tend to use intervening variables that have a c'ognitive 
connotation. 

Figure 3.2 represents the character of the three positions following 
the inventions of Walker, Psychology as a Natural and Social Science 
( 1967 ) . Strict behaviorism is shown with no intervening vari;:bles. ( The 
closest approximation to a strict behaviorism is the theory of Guthrie 
[1952].) The S-R theory is represented in the diagram by the habit 
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VARIABLES 


INTERVENING 

VARIABLES 


EMPIRICAL 
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VARIABLES 


S 


R 


* STRICT BEHAVIORISM 



S R TH EORY 


S ^ EXPECTANCY *- P 


COGNITIVE THEORY 


Figure 3.2 

Diaf::^rams contra siing the theoretical positions of 
strict behaviorism, S-H theory, ami cognitive theory 

construct of Hull (1943, 1952) Tlu* //, which stands for "habit,” is 
bracketed by s and r to indicate that the bond is cxTiiceived to be Ixitween 
some intfTiial representation of the stimulus and the ies[>onse. Tlie equiv- 
alent representation of the results of experience in a cognitive theory, 
such as that of Tolinan (1932), is "expectancy.'’ 

The issue of whether an S-R theory or a cognitive theory is better 
cannot really be resolved here, since the difference is largely a difference 
in experimental strategy. However, there are very practical differences 
between the two approaches. Adoption of .i S-R theory implies a l>elief 
that changes in the value of interveming variables will be explained from* 
the findings of strictly controlled conditioning or instrumental-learning 
experiments. Adoption of a cognitive theory implies that changes occur- 
ring with experience may follow principles derived from cognitive 
research— and may or may not follow strictly from conditioning and 
instrumental-learning models. 

Though the differences between S-R theories and cognitive theories 
are complex and sometimes subtle, they have generated a great many 
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research studies aimed at estaldishing one or the other as the better 
theoretical and research strategy. The question of whether reinforcement 
or contiguity is the basic principle of learning has been e(iually powerful 
in stimulating re!ir‘aroh. The exjK^riments discussed in the next chapter 
pertain largely to these two theoretical issues. 
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The conditioning and instruinental-learrting paradigms, as th<*v arc* 
diagramed in Figure 3.1, are the predominant models th.il thi'onsts have 
used to expl.iin tlie association .md seU'ction of resjTonses. \\3iile it is 
possible to treat the two paradigms separately, there is siilfic'ieiit sim- 
ilarity of variables and probhuns relevant to tlu* two to make* it possible 
to organi/(* the lesearch on the l(Mrnmg process m teims pt pioblems 
rathei tlian in t(‘nns of paradigms 

This chapter will treat six basi(‘ problems Tlie most basic problems 
concern (1) acquisition, (2) transfer, and (3) elimination ol liMrned 
response, a section will be dcvotcLl to Mch ol the three. The next thiee 
si'ctions will r(*view reseaicli on (4) the natur(‘ ol reinlorcement, (5^ ex- 
plorations of the necessary (onditions loi h‘arning, and (6) the problem 
of learning to learn. 


ACQUISITION OF RESPONSES 

While the response to be learned m both conditioning and instru- 
mental learning is in the repertory the organism b(‘fore the learning 
proce.ss begins, the CS ac(fuires the capacity to elicit the CH, and the 
organi.sm a< quires the tendency to choose one among a number of 
available alternative responses. A number of liasic issues in learning are 
associated with the acquisition phase ol le.uning. 

THE EASE OF CONDITIONING AND 
INSTRUMENTAL LEARNING 

An importiint issue in both conditioiiing and instrumental learniiig 
i.s the relative case with which a given response may b(‘ conditioned or 
learned. It is generally true, in agreement with Pavlov’s advice (1927), 
that any full vigorous VR may be conditioned. 

(Conditioning is usually rapid when a sudden, strong, fright -producing 
stimulus is used Sometimes, a single pairing of CS and f\S is sufficient 
in such a case. Conditioning tends to occur more slowly, if at all, when 
spinal preparations arc used and in certain simple reflexes such as the 
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abdominal* patellar, plantar, and pupillary reflexes. Kimble (1961, pp. 
50-52) cites a great many studies demonstrating the controversy over 
how amenable these responses are to conditioned association. 

Since interest -is generally in the learning process itself and in the 
factors which influence the course of learning, experimenters tend to 
study conditioning that involves responses that are learned neither so 
rapidly that observation of effects is difficult nor so slowly that the patience 
of both expel imental organism and experimenter is tried. 

Instrumental learning nroblems may be made easy or difficult at 
will, requiring a single trial or hundreds of trials for successful learning. 
No one has, as yet, devclojied a quantitatively meaningful scale of prob- 
lem difficulty, although many problems would lx; aided by the develop- 
ment of such a dimension. 

THE AMOUNT OF TRAINING 

The amount of training in conditioning is usually manipulated m 
terms of the number of times the CS and the t/S are paired, while instru- 
mental learning is usually manipulated in terms of the number of trials 
on which the organism has an opportunity to make the correct response 
and to be reinforced. 

NUMBER OF CS-US PAIRINGS 

The tendency of the conditioned response to occur in the presence 
of the CS alcaic develops gradually in most conditioning experiments. 
A number of CS-CS pairings is usually necessary. A curve showing the 
gradual acquisition of the galvanic skin response (CSfl) as a function 
of the amount of training is shown in Figure 4.1. In thLs study by Hovland 
( 1937c), a tone was used as CS. The I S was a mild electric shock to the 
wrist of the human subjects. The response to the electric shock was a 
drop in skin resistance. Hovland divided 128 subjects into four groups of 
32 subjects each. In one group, the CS and t^S were paired eight times 
before the CS was presented alone to test for thf' strength of conditioning. 
(The amplitude of the CkSR to the CS in the absence of the US was then 
taken as the measure of the strength of conditioning after eight tpals.) 
A second group had 16 conditioning trials, a third had 24, and a fourth 
had 48 trials before the CS was prc.sented alone. Figure 4.1 indicates 
that, with increased training, the magnitude of the CR increased, and, by 
inference, the strength of the connection between the CS and CR was 
increased. 

The large number of subjects in Hovland’s study, combined with 
Hovland’s exceedingly careful experimental control, yields a much 
smoother curve than is usually obtained, Tlie curve is of the negatively 
accelerated exponential type. A negatively accelerated exponential curve 
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Figure 4.1 

Arquisitum of a condttionea galvanic skin rcsponsr 
The CS was a sound and the US a mild shock to 
the wnst of human subjects (The data are frim 
Holland, 1937c. Copyright 1937 by the AmerieSn 
Psychological Association, reproduced by permission 
The drawing li adapted from Figure 21 of Prin- 
ciples of Behavior by C L Hull. Copyright 1943 by 
D Appleton Century Co Reprinted by permission 
of Appleton-Century-Crofts,) 


IS one svhieh gains a constant fraction of the distanci' n’maining to its 
asymptote on each trial. Tlie negatively accelerated (‘xpoii(‘ntial curvt* 
IS the one that is obtained most frequently in studic's of learning, but 
other curve shapes can also occur. One is an S -shaped curve. In some 
conditioning procedures, no conditioned response may be detectr*d over 
a number of trials. This period may then be followed by a rising curve 
that is positively accelerated, one that gains more on (*ach trial than it 
did on the previous trial. Such curves then tend to become linear and 
then negatively accelerated, forming a rude' S shape It will be noted 
that the latter portion of an S-shaped curve is identical in form with the 
negatively accelerated curve shown in Figure 4.1. A convincing kind of 
explanation of the two kinds of curve has been offered by Spence ( 1956). 
He suggests the S-shaped curve represents the complete course of 
conditioning and that negative exponential curves are obtained when the 
conditioning procedure is applied in situations in which some previous 
learning is effective. Thus the prevalence of the negative exponential 
curve arises from the fact that conditioning studies usually involve CS~US 




32 


CONDITIONING AND INSTRUMENTAL LEARNING 


associations in which some degree of connection exists before training 
l)(*gins; ihey represent the latter portion of an S-shaped carve. 

NUMBER OF TRIALS IN INSTRUMENTAL LEARNING 
(< 

A learning curve similar to that obtained by Hovland appears in 
instrumental learning. Learning (which is here equated to habit strength) 
increases as a function of the number of times the response is reinforced. 
Figure 4.2 shows the efiects of the number of reinforcements on habit 



FigurtJ' 4.2 

Effect of number of reinforcements on habit-strength 
measures in terms of number of trials to extinction 
(Adapted from Hams and Nygaard, 1961. Copyright 
1961 by Southern Umvcrsitics Press, adapted by 
permission of the authors and publisher ) 


strength. Harris and Nygaard ( 1961 ) studied rats pressing a lever. One 
group of rats was given 45 reinforcements in one day; another group 
90 reinforcements in two days; another 360 reinforcements in eight days. 
When reinforcement was removed, the number of responses to extinction 
(disappearance of the response after reinforcement has been removed) 
was recorded. The point on the curve for no reinforcements was obtained 
by fitting a cuiwe to the other three points and determining the zero 
value by extrapolation. Tliis curve is almost indistinguishable from the 
conditioning curve obtained by Hovland in classical conditioning and 
showm in Figure 4.1. 
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OVERLEARNING 

Another curve form that occurs o(‘casionally in coiuhlioiuni; studies 
is one that rises and falls under continued pairing <^f the CS and TS. 
An example may be seen in Figure 4d. In this expenment ihn'c lats 



Figure 4.3 

Prrant avoidance responses of three lats trained 
under three different CS-US inten oh Th( rise and 
fall of the learning curve occurred under continued 
pairing of the CS and US (The studi/ iv hy Walker 
an I Earl as reported in Wad er, Ejf d Copyright 
1964 hy the Nebraska University Press Ctin r 
adapted hy permission of the author and puhlishi r ) 


trained to raise their heads to a 60 degree angle Irom the horizontal, 
a head position which avoided a da/zling light. The CS was a tone, and 
the CS-US intervals were three, five, and 10 seconds, with each animal 
conditioned at a difeent CS-US interval. Each animal had five trials a 
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day for 25 days with a mean intertrial interval of about 40 seconds. The 
tendency to respond by lifting the head first increared and then decreased 
over the 25 day period. The increase and decrease in performance under 
continuation of the conditions that produced the learning was of interest 
to Pavlov (1927), although he did not investigate it extensively. Hilgard 
(1933) published a figure showing an S-shaped curve followed by a 
decline under continued CS-f^S pairing in human eyelid conditioning, and 
Hovland ( 1939 ) referred to it as attributable to the “inhibition of rein- 
forcement.” No simple explanation has been offered. Since the response 
to the US tends to adapt or to decrease in magnitude within a few trials, 
it is ('ommon practiei' to increase the intensity of the US within the first 
few Irials of acquisition of the response. The* usual effect is to restore 
the response. However, there are times when an increase in the intensity 
of the US is ineffective. This, along with the phenoincjion indicated in 
Figure 4.3, ’Jteems to reejuire a distinction between learning and per- 
formance, since performance declines under conditions of learning. This 
distinction will be discussed more fully in a later section. 

Increasing the number of reinforcements in instrumental learning 
does not always lead simply to asymptotic performance There are times 



TRIALS 

Figure 4.4 

A curve that rises and falls tinder continued rein- 
forcement. Starting speed of animals given one trial 
a day in a runway for 42 days with 32 grams of 
w<*t mash as a reward. (The data a^e from Ashida, 
1963, as they appeared in Walker, 1964. Copyright 
1964 by the Nebraska University Press. Curve 
adapted by permission of the author and publisher.) 
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when performance tends to decrease as a result of overlearning. An 
example may be seen in*Tigure 4.4. In this study by Asbida ( 1963), a rat’s 
starting speed in a simple runway rose and then began to fall under 
continued reinforcement. In an earlier study by Kendiick ( 1958), thirsty 
animals running for water were reported first to have increast'd running 
speed and then to have decreased sp(*ed until all animals refused txi run 
at all even though water was still available at the end of the maze. Fuchs 
(1960) failed to replicate the Kendrick results, but there can be little 
doubt that theio aie conditions under which increasing numb<»rs of 
reinforcements result in decreasing performance. It seems likely that, in 
these conditions, habit strength either continues to increase with training 
or becomes asymptotic, and therefore has a fixed value, while the incentive 
x’aliie of the rt'w ard undergoes a change 

EXTERNAL INHIBITION 

Pavlov (1927) observed that the introduction of a novel stimulus 
during conditioning will usually product' a decrement in the lesponst' 
He named the phenomenon external mhihition. \\'hether a novel stimulus 
w ill produce a decrement or an increment in the value of the responsi* 
depends on tht* ehaiacter of th(‘ response being learned aivl the char 
acter of the response induced by the novel stimulus alone. In a situation 
involving instrumental learning, Winnick and Hunt (1951 ) protluccd 
response decrements in running speed by presenting a novel stimulus, a 
four second buzzer, just before the gate w^as raised to allow' tlu* animal 
access to an elevated runway. When the bu//er wms introduccfl for th(» 
first time on the fourth training triifl, it took tin* experim('ntal animals 
more than 30 seconds to run, tlu* control groiip-withoul the novel 
stinuilus—ran in less than 10 seconds. .\s training proci'i'dc'd, the amount 
of decrPinenl produced by the* introduction of the bu//<T for the first 
time became progressively less. Thus, on the fourte'cnth trial, the* control 
group ran the ma/e in 2.3 se*conds, while the e'Xpe'i ime'utal group, expe'- 
riencing the buzzer for the first time just before this trial, required 
3.89 seconds. Similar decrements in response have been shown to result 
from the removal, as oppexsed to tlu* addition, of a part of the stimulus 
complex (F'ink md Patton, 1953). 

A novel stimulus can induce an increase in the measured value oT 
a CR. Kimmel and Cre(*ne (If)64) conditioned the CSR to a visual 
stimulus in human subjects, and introduced an auditory stimulus 
(a 3,000 cps tone) during the presentation of the C'S at difTerent 
stages of training for different groups. Before training, both tlu* CS 
and the novel tone produced a GSR, and both stimuli presented together 
produced a larger GSR than either alone. As training proceeded, the 
GSR to the CS increased and then decreased slightly. Addition of the 
novel tone for the first time, however, had an incieasingly dramatic effect 
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of producing a progressively increasing amplitude of the GSR. On the 
fiftieth trial of training, the CS and the novel tone together produced 
about 10 times the GSR response that was produced by the CS alone. 
Thus, the introdiic-tion of a novel stimulus during training can produce 
(dlher a response decrtmient or response increment, and is probably 
inappropriately named external inhibition . 

Note that the addition or subtraction of a stimulus creates a new 
and different situation VV’hen a i espouse -such as a GSR-is carried over 
from one situation to another, it is calleil a ffencralized response. Stimulus 
and response* generalization will lx* discussed in the iK'xt section of this 
chapter on th«* tiansfer of rt'sponses 

INTENSITY VARIABLES 

THE DISTINCTION BETWEEN HABIT AND DRIVE 

Theorists usually distinguish between habit and drive on the basis 
of the distinction between h'aruing and performance. Suppose we place 
a hungry organism in a (oinplcx situation in which food is available if the 
organi.sm solves a problem At first, the organism is acti\e, but his per- 
formance IS poor. AltiT a number of trials, he l(*arns to piTform the 
r(‘sponsc ({iiickly and without t‘rroi Suppose that later vve place him in 
th<‘ same* situation wIkmi he is not huugr\. His p<*rformanci‘ is likely to 
he again poor, even though we havt* no doubt that he has Icaria‘d to 
solve the problem 

If habit (II) and drive (D) d(‘termiuc pei formalize (P/, out' 
possible relationsliip between thest variables is- 

P II \ D. 

Thus, the value of P will be low if cither D or II is low, and wall be 
maximal only wdien both D and II are high (i.e., w'hen the animal has 
learned to solve the problem and is motivated to do sob The terms 
“habit” and “drive" are fairly specific to S-H theorv ( disc ussed on page 
26), but most theories of learning make a similar distinction. Thus, 
the term “haliit” might be exchanged for anv number of other terms 
that repre.sent the role of historiccd variables— those that deteimim* what 
the organism “knows" about the situation on the basis of previous expe 
ticnce The term “drive" might be excliaiiged for other words that refer 
to the sitnationat variables— those that influence performance These are 
(Wsentiallv the ones that can be manipulated between trials or experiences 
m the situation. 

DRIVE LEVEL AND PERFORMANCE 

Then* is geneial agreement that, if H is held constant, performance 
wull first increase and then decrease as drive is increased. A curve 
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DRIVE blRENGTH 

Figure 4.5 

Invertc(i-V runr showtn^ relationship of perform- 
ance as a function of drne sirenfU^th 


showing the relationship hetwecn perfornuin(‘(‘ and drivt' might look 
something like the inverted (^-shaped function in Figure 4.5. A nuinher 
of explanations have been ottered for this shape. One possibdity is that 
as the time without food grows longer and longer, the organism hec'omes 
more and more active until he starts to giow weak As he 'appioac hes 
starvation, his performance declines until, il no food bccom<*s available, 
he eventually dies. On thf other hand, poor performance sometimt's 
occurs in conditions of high motivation and appreciable habit strength, 
even though physical weakness is not a factor It has been suggested 
that the greater the habit strength, the strongcT the drive can be without 
producing deterioration in performance. This phenomenon is somt‘times 
referred to as the Yerkes-Dodson Law ( 1908). Tlius the athlete with 
little skill does poorly under pressure, while the highlv skilled athlf*te 
produces his best performance when the pressure is greatest. 

A curve such as that in Figure 4.5 is the* kind most generally i*xp(‘cted 
when performance is simple activity or a specific goal-directed perform- 
ance. However, there have been studies of the strength of drive in which 
such a curve has not been obtained. For example, Hall, Low, and 
Hanford (1960) reported no dift(Tenc*es in the activity of hungry and 
satiated animals in a Dashiell checkerboaid ma/e. ()ih‘ explanation 
such failures has been offered b\ Sheffield and Campbell ^ Sheffield and 
Campbell, 1954, Campbell and Sheffield, 1953, Campbell, 1960). 'Fliey 
suggest that no difference in activity l(*vel will appear in an unchanging 
environment. What these experimenters found was that when activity was 
measured in the presence of stimulus changes occasioned 1)\ the opera- 
tion of ventilation fans and the .switching on of lights deprived animals 
increased their activity rnoie than nondeprived. Furthermore, if deprived 
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animals were given water on a regular basis shortly after the period of 
stimulation— so that the changing stimulus pattt^i ns could come to serve 
as a cue for subsequent reinforcement, the changing stimuli produced 
even greater increases in activity. Thus, the presence of a drive does not 
seem a sufficient condition for activity; a changing environment and the 
expectation of drive reduction seem to be necessary factors. 

Another modification of a simple statement of a relation between 
drive level and perfownance comes from a study of the influence of 
temporal feeding patterns on the amount of activity. Birch, Bumstein, 
and Clark (1958) fed rats' from a trough only for the same two hour 
period each day for five weeks. When this feeding rhythm was well 
establLshed, food was omitted. Depressions of the trough, taken as the 
measure of activity, were narrowly confined to the period of usual feed- 
ing, and declined during the hours when the animals were not normally 
fed. This study appears to make clear that the relation between the drive 
strength and activity may frefjuentlv be affected by learned temporal 
rhythms. 

CS INTENSITY 

Does the strength of the CS influi'iice the strength of association in 
conditioning? If conditioning proceeds ecjually well with very weak 
stimuli^ and with intense ones, the frequency with which conditioning 
occurs under everyday circumstances would be very much greater than 
W'ould be the case if effective conditioning occurred only with very strong 
.stimuli. Unfortunately, the problems involved in experimental demon- 
strations of the effects ot stimulus intensitv upon the strength of the 
association (as opposed to the vigoj of the performance of the response) 
appear nearly insurmountable. The difficulty in demonstrating the effect 
of CS intensity on the strength of conditioning is best seen in the context 
of a well-designed study. Grant and Schneider (1949) conditioned the 
GSR response in human subjects by pairing a tone with a shock to the 
wrist. Sixteen groups of five subjects were used. Four intensities of CS 
were used during training and during extinction of the response. Each 
of the groups was trained with one CS intensity, then extinction was 
carried out with either the same or a different intensity. The procedure 
involved 10 trials of adaptation to the sound of the tone alone, 20 paired 
presentations of the CS and US, and then 10 extinction trials with one of 
the CS intensities presented alone. The score of each group was the sum 
of the magnitudes of the GSRs during the extinction trials. The results are 
shown in Table 4.1. The logic of this design is precise. The means of the 
rows represent the differences in learning attributable to CS intensity. 
Since these differences do not show an orderly relationship to CS 
intensity during conditioning, Grant and Schneider conclude that CS 
intensity cannot be said to have affected strength of conditioning. The 
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Table 4 1 

Effects of CS intensfty dunng conditioning and ( xttnctwn as measured by the 
magnitude of the GSR dunng extinction (Grant and Schn^der, 1949) 

CS intensity (dB) 
dunng extinction 

76 S6 96 • 106 Means 

rS intensit\ (dB) 
iuiing conditioning 

Vfeans 


106 

199 5 

77 6 

155 8 

230 8 

165 8 

96 

1198 

2'97 4 

258 6 

427 1 

275 8 

86 

1190 

155 6 

331 4 

270 8 

219 2' 

76 

265 1 “ 

213 0 

” 180 2 

"244 2 " 

225 7 


175 8 

185 9 

^31 5 

' 293 3 ] 

2216 


means of colum is represent the effects of the intcnsits of the TS during 
extinction since each mean has in equal representation of each CS 
intensity during conditioning Here the orderly increase in seore with 
increase in CS intensity along with other evidtncx from the study 
supports the conclusion that the intcnsit) of the C S during extinction 
does affect the performance of the response 

Grant and Schne ider s additional conclusion that the intensity* of 
the CS does not affect the strength of learning (conditioning), is prob- 
ably a sound one vet there are some residual unresolved problems 
The conclusion rests on the assumption that the generali/ation gradient 
for intensity (tendency to make some response to stimuli differing in 
intensity from the CS) is a straight line'' and that generalization effect* 
are equal in the testing stage This assumption is reasonable but not 
certain and in this case is supported by the fact that no significant 
generalization effects are apparent in the data On the other hand, the 
data reflect considerable variability It is conceivable that some effect of 
CS intensity during conditioning might have been found with an ex 
tremely large number of subjects, and positive results might have been 
found with a gre iter range of CS intensities 

US INTENSITY 

The logic of the design of a conditioning study to show the effect 
of the intensity of the US on learning, as opposed to performance, is 
similar to the problem of the intensity of the CS, but the problems of 
interpretation aie more difficult and involved Spence (1953) has 
reported a studs of the intensity of the US that will serve as an example 
He conditioned a group of 80 men, using an increase in the brilliance 
of an illummated disc as a CS and a puff of air to the cornea as the US, 
to produce a reflexive wink The CS lasted 825 milliseconds, and the US 
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came on 755 milliseconds after the CS and lasted for 50 milliseconds. 
On the first day, half of the subjects had a^’eak puff (,25 Ibs./sq. in.) 
as the (/S, and the other half had a strong puff (5.00 lbs. per sq. in.) as 
the US. Each group had 30 conditioning trials on the first day. On the 
sec'ond day they had 20 additional paired presentations of the CS and 
US, but for half of each group, the intensity of the US was changed to the 
otluv value. Tlie frequencies of CRs in the four groups during the 20 
trials of training dnring the second day were then reported. The logic 


Table 4.2 

Effects of US intrrmty on /e/irning mui performance Mean number of CHs 
made w first 20 triaU of clay 2 (Spence, 1953) 

Day 2 US 

((>ounds per Means 

square inch) (Learning) 

.25 5.00 

Da> I US 25 

(pounds per 

s(|uare in(Ti) 5 00 

'.Means < Perfonnanco) 


565 

8.80 

7.23 

7.45 

13CK) 

10.23 

6.55 

10.90 



IS similar to that in the study of CS intensity above. Spcnc'e argues the 
following interpretation: Witfpn each column, the US intensity was 
different on Day 1 and the same on Day 2. Therefore, the difference in 
number of CRs, e.g., the difference l;etween 8.80 and 13.00, must be 
attributed to the difference in (^S intensity on Day 1. Tlu irpplication is 
that the different intensities of (^S produced different amounts of learn- 
ing on Day 1. Within each row, the C>S intensity was the same on Day 1, 
but different on Day 2. Therefore, the difference in number of CRs, 
e.g., the difference between 5.65 and 8.80, is a joint product of the effect 
of different intensities of US on both learning and performance on the 
second day. 

Among the problems which make interpretation of the results dif- 
ficult are the following. ( 1 ) Different I ^S intensities may be regarded as 
producing qualitatively different responses and not simpb the same 
response differing onlv in vigor. (2) Different responses may differ in 
ease of conditioning, and the differences in Table 4.2 might be due to 
(pialitative differences in the responses. (.3) There is an implicit assump- 
tion that the loss in rcspon.se strength through intensity generalization is 
the same vshen one changes from a weak to a strong stimulus as when 
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one changes from a strong^^imulus to a weak one. (4) The piTtorinancc 
on the second day of training represents a a)nfoiinding of the effects of 
the US intensity on learning and performance within that day These 
problems, and others, were discussed by Spence. Many sufisoijuent efforts 
have b<*en made to demonstrate the differential effects of (^S intensity 
on learning and performance. These studies involve the manipulation -of 
many kind.s of variables and the use of implicated logic and ejfperi 
mental designs (for example, see Spence and Tandler, 1963). No 
completely satisfactory resolution to the probleru is available. 

The duration of the US might be looked on as a variable that 
influences the intensitv of the effects of the The duration of the US 
has also been of special interest to theoretical positions such as tho.se of 
Mowrer (1960) who attributed different roles to the onset and the offset 
of the f^S. Kimble ( 1961 ) interprets a number of such studies as showing 
no effect of duration. 

EFFECT OF DRIVE STRENGTH ON LEARNING 

Despite its importanc'e, the question of the effect of drive strength 
on habit strength has proved nearly unsolvable. This problem is similar 
to that involved in the analysis of the effect of t/S intensity on strength 
of association. When drive strength is varied during leariwnj, higher 
drive produces a higher performance level. To test whether this difference 
reflects a difference in habit strength, the standard procedure is to 
reverse the drive levels for half of each group during further training 
or extinction. The difficulties involved in interpreting such data in terms 
of the effect of drive strength on learning may be seen in a study by 
Butter and Campbell (1960): They trained two groups of rats to run in 
a straight runway. They varied drive strength by giving the animals 
either 3 or. 10 grams of their daily 15 grams of food 30 minutes l)efore 
testing. Animals given only 3 grams were assumed to be high-drive 
animals and those given 10 grams were considered to be low-drive 
animals. They ran two trials a day for five days under the original drive 
conditions with 2 grams of food reward on each trial. At the end of the 
first block of five days, the levels of hunger drive were reversed for the 
two groups. On the eleventh day they we»e reversed again, and the 
reversal process continued through five blocks. Thus the drive levels 
for one group were high-low-high-low-high in successive blocks of trials, 
and for the other group they were low^-high-low-high-low. The results 
are plotted in Figure 4.6. Tbe most general result is that the group 
that started at high drive ran faster throughout the 25 days than the 
group that started at low drive. 

The logic of the drive-reversal expei iment can now be applied to the 
results of any successive pair of blocks of trials. Let us add another term 
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SUCCESSIVE 10-7RIAL BLOCKS 

Figure 4.6 

Mean runmng sjyeed plotted as a function of suc^ 
cesstte reversals' of the hunger drive. Each point is a 
mean of 10 trials in five days at one drive level 
Each group was reversed in dUve level after each 
block of 10 trials. (Adapted by Butter and Camp- 
bell, 1960. Copyright I960 by The American Psycho- 
logical Association and reproduced by permiss'ion ) 

to the expression of the determiners of performance, /, for incentive. 
The general formula is: 

P - D X H V L 

Let us compare the performances of the two groups on the first two 
blocks of trials in Figure 4.6 combined. The incentive (/) is equal for 
the two groups, because 2 grams of food was given each animal on each 
trial through both blocks. Drive (D) should be equal, because each 
group was under each drive for one whole block, and the mean of the 
performances for tfie first two blocks should represent each dri\'e strength 
equally. If both D and / are equal for the two groups, then differences 
in performance must be due to differences in habit strength, H. The 
performances are different between the two groups when the first and 
second blocks are combined; they are also different when any other pair 
of successive blocks is combined— the group which had th** high drive 
first is consistently found to nin faster. Is the habit strength higher in 
this group, and if it is, why should it be? The only difference between 
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the two groups is that ^ne started the experiment at a low drive and the 
other started at a high drive. It would be possible to argue that habit 
strength, H, should also be equal between the two groups, since they 
had had equal numbers of trials under each drive strength. Yet [)erfonn- 
ance in the second block is essentially the same, even though the drive 
strengths differ, and the near identity can presumably be ac'c^^^fr^d for 
by assuming different values of H. 

One can account for these results only by either challenging the 
logic of the drive-reversal experiment or (}uei>tioning the applicability of 
that logic in this particular instance. The first course i.s difficult and 
unlikely to offer a simple re.solution in a short treatment. Butter and 
Campbell (1960) offer an explanation of their results \\hicli questions 
tlie applicability of the logic and at the same time poses a general 
problem for all similar research designs on the problem. The logic 
as.sumes ( 1 ) that if stimuli arising from the drive state* itself are a part 
of the problem, then they differ only in intensity, (2) that the resjxinse 
of running down the runway is in all east's th(‘ same in ('haracter, but 
differs only in vigor. Cotton (1953) did a runway experiment in which 
drive strength was varied, and in which he counted the niimlx'r of 
irrelevant or interfering resport.ses that occurred. If he counted only trials 
during which no interfering responses occurred, running-<cpted did not 
differ very much with difference in hunger drive. The implication* i.s that 
the response.s that occur under high and low drive are qualitatively 
different. Butter and Campbell .suggest that in the initial block of learning 
trials in the experiment, the high-drive animals learned to run, while 
the low-drive animals learned some incompatible, nonrunning responses. 
Therefore, the responses learned by the two groups wer<* (|ualitative1y 
different. Tlius, in order to investigate the effect of drive strength on 
leaniin^, it is necessary to vary drive strength during learning— to do so 
tends to produce qualitative differences in the responses learned. How- 
ever, even in the face of these difficulties, the most frequent ctinclusion 
is that drive strength affects performance but does not affect the habit 
strength or the amount learned. 

AMOUNT OF REWARD 

Do differeijces in the amount of rev^ard influence performance by 
changing habit strength, drive level, incentive value, or combinations of 
these elements? The best tests of this question are probably those in 
which an effort is made to vary the amount of reward without varying 
the nature of the consumption of the reward. If an animal is given a large 
amount of food, his consummatory behavior will be different— more 
prolonged— than if he is given a small amount of food. 

A study that attempted to avoid this problem is one reported by 
Bower, Fowler, and Trapold (1959). They varied the amount of rein- 
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forcement by varying the amount of shock redwetion an animal received 
for running a short runway. They ran three groups of animals with 250 
volts on the grid o^ the starting box and runway. Since the voltage level 
was the same for all three groups, it is presumed that they were equally 
motivated. Reinforcement was varied by having the goal box charged 
with 50^ volts less, 100 volts less, or 200 volts less than the runway. The 
animals wer^^ kept in the goal box for 20 seconds after they had reached 
it. Jn Figure 4.7, it is apparent that the greater the drop in voltage, the 
faster the animals ran to get to the goal box, even though the intensity 



AMOUNT OF SHOCK REDUCTION IN GOAL BOX 
(IN VOLTS) 

Figure 4.7 

Effect of amount of reward-defined as the amount 
of reduction in shock level in the goal box— on per- 
formance. (Drawn from data reported by Bower, 
Fowler, and Trapold, 1959. Copyright 1959 by The 
American Psychological Association and reproduced 
by permission.) 


of the shock on the grid was the same for all while they were in the 
starting box and iiinway. 

A study by Tombaugh and Marx ( 1965 ) illustrates another approach 
to the problem. They varied the amount of reward by presenting all 
animals with the same amount of liquid to reinforce lever-pressing 
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responses, but varied concentration of sucrose in the li(juid. Tho\ 
trained four groups of eight animals eacli to press a bar to receive a small 
dipper full of liquid. Each group had received a different concentration 
of sucrose. During each of 24 days of the experiment, each animal was 
reinforced during the firsj four of eight two-minutc periods and was not 
reinforced during the last four periods. Thus it was possible to njiasure 
the number of bar presses per period during reipforcement anti during 
nonreinforcement. Figure 4.8 makes clear that the concentration of 
sucrose, and thus the amount of reward, did •affect performance in both 
daily phases of the exjKTiment. However, th(‘ effects were different in 
the two phases. Performance during reinfortement appears to iner(‘ase 



Figure 4.8 

Effect of quality of reinforcement on performance 
during periods of reinforcement and periods of non- 
reinforcement. (Adapted by permission from Tom- 
baugh and Marx, 1965. Copyright 1965 by The 
American Psychological Association and reproduced 
by permission.) 


and decrease as the amount of reward increases, while performance 
during nonreinforcement increases with concentration. Tombaugh and 
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Marx note that there is evidence that animals t:end to spend a greater 
length of time licking the dipper when the concentration is greater; and 
there is some possibility that with the stronger solutions, satiation occurs 
more quickly This could account for the upper curve in Figure 4.8 in 
which performance tends to decrease at higher concentrations. The per- 
formance during extinction, however, is not affected by any consum- 
matory behavior. It appears to reflect differences attributed to the 
experiences of the animals during reinforcement. 

Different amounts or qualities of reward appear to have their effects 
upon the incentive value, /, rather than on habit or drive. A further 
development of the experiment by Bower, Fowler, and Trapold (1959) 
involved changing the amount of shock reduction in the goal box for 
subgroups taken from the original groups. In all such subgroups, the 
running speed quickly changed. Thus, for example, animals shifted from 
a 200 volt reduction to a 50 volt reduction slowed up within five trials 
that tbeir. performance was indistinguishable from animals that had 
ecn running for a 50 volt reduction from the beginning. Likewise, a 
group shifted from a 50 volt reduction to a 200 volt reduction speeded 
up to the performance* level of animals running for a 200 volt reduction 
in shock in the goal box from the beginning. It should be noted in 
pas.sing tha^ ‘.here have been times when results different from those of 
Bower*, Fowler, and Trapold have been obtained. For example, Crespi 
(1942, 1944} in shifting amounts of food reward obtained an “elation'* 
effect from increasing the amount and a “d(*pression” effect from decreas- 
ing the amount. Su'^’h findings cannot be ignored, but involve . compli- 
cations which cannot be dealt with here. The Bower, Fowler, and 
Trapold results are more typical of studies involving incentive shifts. ‘ 


argument tliat it is incrntive value rather tlian habit that changes in the Bower, 
Fowler, and Trapold (1959) experiment is ba.sed, at least in part, on the conception of 
habit as repre.sentiiig the permanent effects of training. By this conception, H may 
increase but not decrease. Since animals which were running rapidly for a 200 volt 
reduction in shwk slowed up when shifted to a 50 volt i eduction, then the only term in 
the expression 

F = D X f / X / 

that could change to decrease the value of P would be / 

If H can only increase, while / can rise or fall as the incentive is changed, then there 
are problems with the results of Tombaugh and Marx (1965) in that Figure 4.8 shows an 
effect of amount of reward on the number of responses under nonreinforcement 
conditions. With the incentive at a zero value, all performances should be the same. 
A possible answer is that different amounts of reward produce different amounts of 
secondary (or learned) reinforcement and that in the Tombaugh and Marx experiment 
the short extinction period along with the repeated penods of reinforcement and 
nonreinforcement served to make differential amounts of secondary reinforcement 
effective during the extinction period. 
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ACQUIRED DRIVE, AND REWARD 

The formula that has been used to distinguisli the individual effects 
of drive, habit, and incentive on performance mor^ or less implies that 
learning consists of a change in the value of //, habit. However, the 
situation can be considerably complicated by the likelihood that stimuli 
or objects can themselves acquire the capacity to induce a*vlrive state 
or can acquire reward value. Thus all three* terms— drive, hal>it, and 
incentive— may be affected by learning. In fact, it is generally assumed 
that neutral stimuli, which are designated as “neutral * bc'cause they do 
not appear to have properties of drive, ineentivt. or reinforcement, can 
acquire these properties through appropriate conditioning procedures. 
It is assumed that; a neutral stimulus paired witli the otisct of a strong 
drive stimulus, such as shock, can ac(juire drive prop(‘rties of its own, a 
neutral stimulus presented in close association with the offset of a drive 
stimulus Mich as shock, or with the presentation of a r('w'4rd such as food 
or water, can take on incentive and reinforcing properties of its owui. 

ACQUIRED DRIVE 

The classic demonstration of accjuired drive is a study reported by 
May ( 1948). Rats were trained to jump across a hurdlejt(§ni one side of 
a box to the other to escape shock. Once the animals had leariK^tl to jump 
promptly whenever the rfliock w'as turned on, the second phase licgan. 
In this phase the animals were confined to a small compartment in tin* 
center of the apparatus, and given a series of pairings of a buzzer and 
shock in the classical conditioning paradigm. In the third phase, the 
small compartment was removed, ^and the buzzer was sounded alone, 
without shock. In this phase about 80 pcrc(‘nt of the animals jumped 
the barrier to escape the buzzer, which was turned off when they jumped. 
Contmil animals jumped less promptly and less frequently. Presumably, 
the buzzer had acquired drive properties to motivate the jumping 
response. F’urthenmore, termination of the buzzer may have become an 
incentive and may have acquired reinforcing properties. 

A study by Goldstein (I960) explored the effect of drive strength 
as represented by shock strength, the number of classical conditioning 
trials, and thr order of the CS and US ^>n acquired drive. In contrast to 
the demonstration procedure used by May, Goldstein did not give The 
animals prior training in hurdle jumping. He simply presented a five 
second CS and delivered shock during the last second of the CS. To 
test the effect of .shock, Goldstein used a ‘two-chamber box, he presented 
the CS and opened the door between the chambers simultaneously. Wlien 
an animal jumped the hurdle into the other compartment, the CS was 
turned ofiF. The test consisted of 15 trials of hurdle jumping in a two hour 
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period, the performance measure was speed of ^SKf'ape. He used three 
different intensities of shock, and in the test phase the speed of hurdle 
jumping seemed to increase in an orderly way with intensity of the shock 
during conditioning " Thus the more intense the shock during condi- 
tioning, the stronger the acquired drive. However, Goldstein's results 
when ho .manipulated the number of CS-f/S pairings and the order oi 
CS and US‘ did not follow the pattern to be expected in classical condi- 
tioning. He had different groups with either one, three, nine, or 20 CS-US 
pairings, and speed of hurdle jumping did not increase as the number of 
conditioning trials increased. Furthermore, he had one group in which 
the CS followed the most intense shock by 20 seconds, and was thus 
in a “backward"’ conditioning pattern, this group jumped the hurdles 
about as fast as the averagi* performance of the groups with the same 
number of CS-t/S pairings in the standard order. Thus, while a neutral 
stimulus appears to acquire drive properties through classical-condition- 
ing procedures, the acquisition process does not seem to follow all of the 
laws of classical-conditioning responses. 

ACQUIRED REWARD 

A demonstration of the power of a secondary reward is that of 
Ehrenfreund He set up a simple problem in which rats were 

required ‘co discriminate between black and white for a food reward. 
Two groups were run which differed in one respect only. An empty food 
cup was placed on the int'orrect side of a single-choice-point maze for 
one group and not for the other. Since a food cup was always present 
when the animal received food, food cups could be expected to take on 
secondary reward properties. If a cup had secondary reward properties, 
it could be expected to reinforce the incorrect response to some extent 
and thus delay the learning of the correct response. In fact, the group 
with the food cup on the incorrect side did take nearly twice as many 
trials to learn and made more than twice the numbcT of errors as the 
other group that did not have the empty food cup on the incorrect side. 

This demonstration, alone, does not estai)lish the existence of 
secondary reward, since it is possible that any additional small object 
in the incorrect goal box might have made that side more attractive. 
However, other studies have succeeded in manipulating the secondary 
reward value of a single stimulus by manipulating variables which could 
be expected to vary the amount of learning. In a study of bar pressing, 
Miles (1956) varied the number of times a dim light and click were 
paired with reward and then extinguished half of his animals with the 
click and light accompanying the lever press and half without. He 
found that the greater the number of pairings of the light and sound with 
food reward, the longer the extinction was delayed. Thus the strength 
of the secondary reward effect increased in an orderly way with the 
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number of trials. Miles also varied the strength ol the hungcM- drive 
during extinction anfl 4ound that the stronger the diive, the longer thr' 
animals continued pressing without primary reintorcenu‘nt Thus, it is 
clear that the secondary reinforcement effect is not simply a matter of 
the presence of any additional stimulus. 

Efforts to demonijtrate greater secondar\ reinforcing value* lor a 
stimulus paired with a larger reward have ireqiu‘ntly been uy.stK'cesslul. 
However, in a studv by Butter and Thomas ( 1958), positive results vveie 
obtained. Thev trained two groups of rats in a Skinner box. giving 
0.1 milliliters of solution for each of 48 'pi esses. The reward was an 
8 percent sucrose solution for one group and a 24 pt'ii cMit solution for 
the other During extinction, fi»r th(‘se two groups, tiie delivery mec'fi 
ani.sm was allowed to click, but for a control group this sonrev of 
secondary reinforc'ement did not oicur Tin* eontiol gioup ga\e 3.9 
responses in extinction tin* 8 percent sc»lution group gav# 9.4, and the 
24 percc’rit solution group gave 17 6. Thus the click of the mechanism 
appears to have acquired secondary reinforcing pioperticss and to have 
becMi nion* (4f(‘Ctive in the group with the higher conciMiti.ition of 
sucrose It is difficult to sav lor eeilain that the diffen'nct^s wctc exclu- 
si\clv attributable to differenc^es in second, u\ leinforcing properties, for 
the two (‘Xpenmeiital groups may have diHered diinnv iht* training 
phase. ilowcNcr, complete separation of tlu* (‘fleets of dilleren% (jualiti(‘s 
of reward on what is leani(‘d as opposed to st*(‘oiulary Kunforemg effc'C'ts 
during (‘xtinetion is exceedingly difficult logically and eX|X‘rim(‘ntally 
Another question of importance is the* spvcificiUf of a secondarv 
rcMiiforcing stimulus to the drive under which it ac(|i)irt‘d the property 
Will a stimulus that accjuirc’d secondars reiriforcang properties under a 
thirst drive* seive as an incentive and as a reinforcing stimulus under 
another drive or no drive at all'^ Estes ( 1949a, b ) trained animals to press 
a lev?r for w'aler w'heii thirsty, and then tested them for extinctic^n under 
(uther hunger or thirst. He found that the thirsty animals gave the 
greater number of responses in extinc tion but the hungry am'inals also 
extinguished slowly in the presence of tlie click of the mechanism He 
concluded that the original drive was not n(‘C(*ssarv for a s(‘eondary 
reinforcing .stimulus to be cff(*ctive. It was only necessary to have* a 
sufficient drive piesi*nt to instigate the activity that is to be reinfor 4 ;**d 
A v(‘ry difficult problem in the isolation of the secondary reinforre 
ment effc^ct from other variables in a learning experiment arisc's from 
tfic fact that for an animal to discriminate bc'tween oii(‘ response and 
anothe r, some cue to the correct response must be provided. 'Hus cue, 
since it is alwMVs follo\/ed by reward, wu'll b(H'oine a s(*coridary H*inforc(*r 
Thus the same .stimulus accjuires both a c-u(‘ function and a reinfon ing 
fuiK'tion .An effort to di.stingnish between these tw'o functions was made 
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by Dinsmoor (1950). He set up a discrimination, problem in a lever- 
pressing situation. One group of animals was normally in the dark, but 
the light was turned on to signal the fact that a response would now lead 
to reward. The first response made after the light came on was reinforced, 
and the light was turned off when the animal seized a pellet of food and 
started to eat. The light remained off until the animal had failed to 
respond for at least 30 seconds. Thus the light became the cue for 
responding and the darkness became a secondary reinforcing stimulus. 
A second group had the cue«> reversed. During extinction, each group 
was divided, with one half in the dark and the other half in the light. 
Each response produced a three second change in the light pattern. 
Thus for half, a response led to the cue for responding, and for the other 
half, a response led to the sec'ondary reinforcing stimulus. A pair of 
control groups was extinguished either in continued darkness or con- 
tinued light, ^ with responses producing no change in the stimulus 
conditions. lh)th conditions in which a response during extinction pro- 
duced a change in the stimulus led to slow extinction compared to the 
control group, and there was little difference between the two groups. 
Even though, as will be noted later, stimulus change alone can serve as 
an incentive in situations of this kind, the failure of the two stimuli to 
have different i-uffects on extinction represents a failure to show a differ- 
ence betv^een the cue function and the reinforcing function of stimuli. 

In spite of difficulties in separating the secondary reinforcing 
functions of stimuli from other roles in learning and performance, there 
can be little doubt that the concept of secondary reward is a meaningful 
one. Furthermore, in contrast to the results obtained in the case of 
acquired drive, the acquisition of secondary reward or reinforcement 
appears to follow the ordinary laws of learning. Miller ( 1951 ) has 
assigned four roles to a stimulus that has acquired secondary reinforcing 
properties through association with primary reinforcement: (1) It can 
produce new learning. (2) It can support learned performance and 
prevent normal extinction. (3) It can serve to bridge a temporal gap 
between the response and delayed reinforcement. (4) It can have 
incentive function in that the presence of such a stimulus can produce 
approach activity, 

TIME FACTORS AND TRACE CONCEPTS 

Both the rate of learning and the amount learned are dependent 
upon temporal relationships. In conditioning, the important interval is 
that between the presentation of the CS and the occurrence of the US. 
In reinforcement learning, the important interval is between the occur- 
rence of the response and the appearance of the reinforcement. In any 
learning situation in which training is demarcated in terms of trials, an 
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important interv'al is the time between trials (the intertrial interval). 
While the problems are^iftpirical. there is considerable interest in efforts 
to explain the manner in which two temporally separated events are 
bound tojpther. One general concept is that of trace. Ifra stimulus occurs 
at one point in time and can be shown to have an effect on a subse<juent 
event, then it must have Iq^t some sort of a trace of itself in the organism. 
Logically, it is reasonable to talk about stimulus traces, response traces, 
and memory traces. Furthermore, a distinctioi/ is fre(juently made 
between an active trace and an inactive one. An active trace is concvived 
to be a perseveration of neural activity for a period of time after the 
stimulus is removed. The second kind of tract* is thought to involve some 
structural change in the nervous svstem that is relatively permanent and 
can endure for long periods of time. 

CS-US INTERVAL 

If tlu* CS has sufficient duration, possibly as long as a second, lh(‘re 
are two intervals of interest. One is the time from the beginning of the 
CS to th(‘ beginning of the US. The other is the time bt'lween the end 
of the C.’S and tin* beginning of the US. The beginning and the end of a 
CS appear to have different values as cue properties in learning. Kish 
(1955) ran a pair of studies in \vhich a light either coming on or going 
off was used as the CS to signal shock. The siibj(‘c‘t could stop th(*#shock 
by turning a wheel. Learning was faster at several CS-US intervals when 
turning the light on was used a.s the CS 

Thi.s difference is reflected in two difterent conditioning pro('(‘dures. 
In trace conditioning, the CS is usually brief, and the interval betw'cen 
the onset of the CS and the onset of the'f^S is taken a.s the CS-US interval. 
Since no relevant stimulus is presumed to be [»re.sent during the interval, 
the effects of the US must then be associated with the trace of the CS. 
In delayed conditioning the C'S is continued until the US is presented, 
and both stimuli are usually terminated together. In this procedure the 
CS is present when the US is pre.sented, and this bridges the gap between 
the beginning of the CS and the onset of the US. 

There is general agreement that the optimal CS-US interval in 
classical trace conditioning is approximately .5 second. For example, 
Reynolds ( 1945) conditioned a human eyeblink with a click in earphones, 
as the CS and a puff to the cornea as the US. He found a greater number 
of CRs in 90 training trials when the CS-US interval was 450<r (milli- 
seconds ). There were fewer CRs at 250 <t and at 1,150(7, and the number of 
CRs at 2,250(7 did not exceed the number expected when the two stimuli 
are presented but unpaired. In fact, it is generally believed that no condi- 
tioned association takes place in eyelid conditioning when the interval 
is as long as 2,250(7, or 2.25 seconds. 
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In other forms of conditioning, longer intervals can lead to associa- 
tion. Kamin ( 1954) conditioned dogs to jump a hurdle from one compart- 
ment of a box to another to avoid a very strong shock. With a number 
of different measures of learning, he obtained the best performance with 
a five second CS-T'S interval in comparison with intervals of 10, 20, and 
40 seconds. A sample of his results is plotted in Figure 4.9. 

If the CS follows the US, instead of preceding it, the procedure is 
called backward condiHoning. In most studies in which the order of the 
stimuli is reversed, no conditioning occurs. However, there are 
exceptions. Cdiampion and Jones ( 1961 ; made a study of forward, 
backward, and psi'udoeonditinnmg of the human GSR to shock. The CS 
was a tone of 20rr duration, and the shock lasted 6()()«7. In the forward- 
conditioning proc'edure, the ons(*l of the tone preceded the onset of the 
shock by 500(7 In the backward procedure, the onset of the shock pro- 
ceeded the onset of the tone by 750,7, k'aving an interval of 150(7 between 
the two stimuli. In the pscMidoconditioning procedure, the tone and shock 
were presented the same number of times but were not paired. The GSR 
to the tone after training was greatest for the forward condition, was 
substantial for the backward condition, and was minimal for the pseudo- 
conditioning procedure It s<H'nis possible* that backward conditioning can 
occur when the response is one of fear or arousal but not when the 
respon.sf* is a^musculai movement The utility of f(*ar aris(*s when the 
organism has gotten into a dangerous situation and survived The c'ue 
might well be tlu* one pres(‘nt, during or even after the .iroiisal of fear, 
when th(‘ organism is looking around to set* what might hav(‘ signalc^d 
the presence of the* dangerous situation 

In delayed conditioning, the CS is continued in th(* interval between 
the onset of the CS and the onset of the US, and the results are somcnvhat 
different and more \ariable than is the ca.se with trace conditioning. 
Hartman and Grant ( 1962) conditioned the eyeblink in a discrimination 
situation in which only one of two CSs was followed by the US. They re- 
port the best conditioning to the positue stimulus at about 6(X)t;- and the 
best discrimination at about 800(7, results that are not too different from 
similar studies with trace conditioning Get all and Woodward (1958) 
conditioned human pupil dilation and obtained the best lesults with a 
CS-US interval of 1, 500*7. Smaller pupil changes occurred with intervals 
of 125(7, 500(7, and 2,500(r, but all of ihese showed conditioning when 
compared to a group that had the stimuli unpaired. Kimincl and Penny- 
packer (1963) obtained optimal GSR conditioning using shock as the 
US in human subjects when the delay interval was 1,000(7 The best 
discrimination occurred at 2.000,7. Fi,sh conditioned with an illumination 
change as CS and shock as GS showed best conditioning at 2 000,7 CS-US 
interval, and appreciable conditioning at other intervals ranging from 
500(7 to 4,000.7 (Noble, Gnunder and Mever, 1959). Ross (1961) paired 
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an illumination change ^with shock in human subjects. To measure the 
eflFect, he delivered a puff* of air to the c'ornea and noted the amplitude 
of the blink. He found the best conditioning at 2,(X)0ct and 5,000<r, appre- 
ciable conditioning at lO.OOOa, but little or none at !500tr, the optimal 
interval in trace conditioning. Thus classical delayed c*oiKlitioning 
involving a variety of reiponses and subjects shows optimal intervals 
ranging from 600cr to 5,000<t, sometimes shows little eonditicming at 
SOOtr, and frequently shows substantial conditioniiig at intervals longt'r 
than the 2,250<t, regaided as l>eyond the limit of association in classical 
trace conditioning. Thus, the continiu'd prt'seiice of the CS appears to 
bridge the time gap and allow association over longer intervals. 

Even longer intervals can be used in avoidanct' (‘onditioning. In 
fact, the shorter intervals of classical conditioning are frequently inef- 
fective. For example, Schwartz (1958) obtained better horning of a 
.shuttlebox response in rats with a six second delay betwetMi tfte CS and 
the shock than he did with a three second delay. Low and Low (1962) 
used intervals of two, four, six, eight, and 10 seconds in a similar study 
and obtained better learning the longer the interval. Beyond this range, 
Brush, Brush, and Solomon (1955) and Ohurch, Brush, and Solomon 
(1956) found very little difference in the shuttlebox performanc'e of dogs 
with intervals ranging from five to 40 seconds. Figure 'shows the 
results of Brush, Brush, and Solomon (1955) compared with thhse of 
Kamin (1954). These studies, carried out in the same laboratory, are 
probably sufficiently similar to p(*rmit the conclusion that the differences 
are attributable to differences between the trac'C procedure (Kamin) 
and the delayed procedure (Brush, Bnush, and Solomon). The fact that 
Kamin ’s animals received a greater numbei of shocks before reaching 
criterion as the interval was increased means that thc‘y were learning 
less efficiently to jump the barrier. 

The usual interpretation of CS-LS interval, as it affects association, 
is that there is a brief latency period after the onset of the stimulus before 
it has its full effect on the organism. Therefc^re, the fact that conditioning 
IS better when the interval is about SOOtr than when the two stimuli are 
presented simultaneously is attributed to llie likelihcxxl that the CS has 
reached its maximum effect after this len^^h of time. The decrease in 
effectiveness of a CS as the interval is lengthened beyond 500(7 is attrib-- 
uted to a generally decaying trace of the stimulus; this means a progres- 
sively less efficient trace- f/S interaction when the US is presented. 
The delayed procedure is presumed to maintain an active process 
representing the CS and thus bridges longer intervals such as those used 
in the Brush, Brush, and Solomon study. 

That longer optimal intervals are obtained in avoidance conditioning 
than are obtained in classical conditioning might be explained by the 
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Figure 4.9 

•C'jinpanson brtwecn the numbers oj shocks received 
before reaching criterion performance by dogs trained 
to jump to Qx oid shocic under (1) trace arul (2) 
delay ( (l~ conditioning procedures (These results are 
replotted from Kamin, 1954, and Brmh, Brush, and 
Solomon, 1955, Copyright 1955 by The Amen 
can Psychological Association and reproduced by 
permisswu.) 

fact that the latency of a complex instrumental response, s\ich as a dog 
jumping over a barrier in a shuttlebox, is much longer than the latency 
of an eyeblink. Therefore, a longer CS-US interval is required for the 
jumping response to occur in anticipation of the shock. 

DELAY OF REWARD 

A significant time interval in selective learning through reinforcement 
js the interval between the occurrence of the response and the appearance 
of the reward. One of the earliest studies of this variable showed no 
difference in learning between a condition in which the reward was 
immediately available and one in which it was delayed for 30 seconds. 
Watson (1917) trained rats to dig through sawdust to find a hole which 
permitted acces.s to a food chamber. In the chamber, there was a food 
cup covered by a perforated lid. In the immediate-reward condition, 
the animal was allowed to eat as soon as it reached the chamber. In 
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another group, the lid was removed 30 seconds after the animal reached 
the food chamber. Bfith* groups learned rapidly, and there were no 
differences between them. 

Later studies found differences in learning with differences in delay 
of reward, and Watson’s results were accounted for on the basis of the 
likelihood that the smell | 0 f the food through the perforations in the lid 
served as a secondary reinforcement of the response. In studies in which 
some of the sources of secondary reinforcement -ivere removed, a sharp 
gradient of reward was found indicating that the longer the rewaid is 
delayed, the slower is the learning. For exarrfple. in two studies by Penn 
( 19‘43a, b) it was estimated that no learning would occur if the rewaid 
was delayed somewhere lietween 14 and 44 seconds. Both of these* 
studies were run in situations in which a lever could be presented to the 
animal, the lever withdrawn w'hon a response occurred, and the delivery 
of the reward delayed at will. In a simple lever-pressmg situation, some* 
learning occ urred with a 10 second delay, but none apparently occurred 
with a 30 second delay. In another situation in which the animal w'as 
required to learn w'liether to push the U*ver to the left or right, the 
obtained curves extrapolate to zero h'iuning at 34-44 seconds. 

A typical curve showing the i elation of rale of learning to the delay 
of reward is that in Figure 4.10 obtained bv (irice ( 1948) He trained 
animals to choose either a black or w'hite stimulus, and then dek\^'ed tlie 
animals in a gray chamber before admitting them to gray goal boxes. In 
this w'ay the secondary reinforcing properties of the black and white* 
stimuli wTre acejuired under the same delay of reinforcement as the 
response itself. With this control of secondary reinforc'cment, all animals 
learned if the delay w as two scconds'*or less, but two of 10 animals failed 
to learn with a five second delay and thre^* of five failed to learn with a 
10 sec'ond delay. The plot in Figure 4.10 is of medians of the number of 
trials to learn, and the dashed portion of the curve indicates that some 
animals failed to learn w'ithin the* time allowed by the patience of the 
experimenter. A group that had black and white goal boxes, thus 
permitting acquisition of secondary reinforcement without the delay, 
learned in 1.55 trials with a five .second delay between the respon.se and 
access to the goal box. 

Findings such as those of Grice sugg^ si that the gradient of reward 
might be largely a matter of the delay between a relevant stimulu.s aTld 
the reward rather than between the response and the reward. This 
suggestion leads to experiments such as that of Bersh ( 1951 ) , the results 
of which are shown in Figure 4.11. In this study, different values of 
secondary reinforcement were set up by pairing a light with the delivery 
of food pellets to hungry animals. Bersh used a delayed conditioning 
procedure in varying the CS-rcinforcement interval. For different groups 
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RtSPONSt REINFORCFMtNT INTERVAL IN SECONDS 

Figure 4.10 

Effect of delay of reinforcement on the number 
of trials to meet the criterion of learning in a block- 
white discrimination problem (Data are from Grice, 
1948 Copyright 1948 by The American Psycholog- 
iral Association and reproduced by pennisston ) 


the light came on 10, 4, 2, 1, .5, or 0 seconds before the delivery of 
food and remained on for two seconds after the pellet was delivered. 
The acquired reinforcement value of the light w'as then tested by 
introducing a lever which produced a one second light when pressed. 
The animals were placed in the box for 45 minutes on each nf two 
successive days. Figure 4.11 is a plot of the median numbers of presses 
of the lever each group produced for no other reward than the light. 
In a similai study. Jenkins (1950) demonstrated a small degree of sec- 
ondary reinforcement when the delay was either 27 or 81 seconds. 

On the basis of results such as these, it appears possible that there is 
no gradient of priimiry reward. To be effective, reward, either primary or 
secondary, must be immediate. L^earning occurs when there is a time 
gap between the response and the reward because, as suggested by 
Miller (1951), previously neutral stimuli acquire secondary reinforce- 
ment value and thus bridge the time gap. 
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CS RtlNFORrfMtNl INIlRVAl IN SLCONDS 

Figure 4. i i 

77/r strength of sciondartf retnforermrut (K quiu d 
at (h[f(n'nt dclatf intrnah between the C>S and the 
rnnjorcement, measured tn terms of the number of 
lespon.ses performed to produce the Ji'S in two 4!> 
rnwute periods of two days. {Adapted from Bersh, 

1951 Copyright 1951 by The American Psycho- 
logical Association and reproduced by permission ) 

INTERTRIAL INTERVAL AND THE 

CONSOLIDATION HYPOTHESIS 

A siirilific-ant interval m all learninji; is ihe liine helween two 
learning trials or the tune between a learning trial and some test of die 
amount of learning. Thtre are a great many studies ol learning in which 
trials are closely spaced, usually referred to as massed practice, 
for one group, and for another group are more widely spac(‘d, usually 
called spaced practice It is a common finding that learning progresses 
more rapidly with spaced practice than with massed prac'tice. Unfor- 
tunately, there are usually only two degree's of spacing and little agrees 
ment on how much time must separafe trials tf) crnrstitiite the most 
eflScient conditions of learning. 

In studies in which more than two inteivids lietween trials have 
been used, no simple increase in the efficiency of learning arose from 
increases in the amount cf time l^etween trials. For example, in a study 
of intertrial interval in eyelid conditioning, Sj;)ence and Norris (1950) 
gave four groups of human subjects 100 classical conditioning trials with 
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Figure 4.12 

Curve shoxvtng differences in conditioning resulting 
from different intervals between conditioning triah 
The intervals were x'ariahle. ranging from 6-12, 

10-20, 20-40, and 60-120, and are plotted at the 
arithmetic mean lalue The data are the percent ('Hs 
111 the last 40 of 100 trials. (Adapted from Spenrx , 
and Norris, 1950. Copyright 1950 by The Ameri- 
can Psychological Association and reproduced by 
permission ) 

viiriable intortnal intervals with means of 9, 15, 30, and 90 seconds for 
the various groups. Tiguie 4.12 shows that it is generally true that the 
longer the intertrial interval the more effective is the same number of 
trials, but performance was somewhat better at 15 seconds thf.n at 30. 
That such findings might not be accidental can be seen in a study by 
Kamin (1963) in which he gave rats 25 training trials in which they 
iwoided shock in a shuttle box. He then gave them another block of 
25 trials with varying amounts of time between the two blocks of trials. 
Figure 4,13 shows that the performance of the animals in the second 
block was significantly poor when either one or six hours intervened. 
From the data of these two studies, and others, it is possible only to 
generalize that some complex process is occurring during n significant 
interval after a single practice trial or after a series of trials, and that 
either a test or further training is likely to be affected by this process. 

A common speculation as to the nature of that process is that an 
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Figure 4.? 3 

Curie showwf^ the number of conditioned responses 
in a second trainuo^ session depending upon the 
amount of time between the first and the second ses- 
sion (Data dtawn from Komtri, 1963 Copijrx^hi 
1963 by The American Psychological Association and 
reproduced by permiss'ion,) 


effective stimulus or a bit of hohavicr sets up an active* neural process 
that continues for a significant period of lime after the stimulus has been 
removed or the response has ceased Such a process is calli‘d a perse- 
ve rat ive^ stimulus trace or a perseverative memory trace It is a further 
speculation that pf*rsevcration is necessary to produce th<‘ '.emipeimanent 
structural changes on which the long-term elf(*cts of practice, and thus 
learning and memory, depend. Perseveration of the trace is said to 
produce consolidation of the semipermanent memory trace. 

There is a large number of studies based on this hypothesis. They 
take the general form of providing a given amount c^f training or expe- 
rience, then subjecting the organism to some procedure such as tljf* 
application of electroconvulsive shock ( ECS ) which is presumed to 
disrupt the perseverative consolidation process. By varying the interval 
between the practice and the application of FX S, differing amounts of 
permanent memory are thought to be "‘laid clown” befori* the disruption 
prevents further consolidation. The results of one of many such studies 
are shown in Figure 4.14. These are results from a study by King ( 1965). 
He trained six groups of animals to run down a passage to a water 
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compartment when they were thirsty. On the twenty-first trial, five of the 
groups received a shock on the feet on reachihg the water compartment. 
Four of these groups received ECS 75, 300, 900, or 3,600 seconds after 
the experience of the shock on the feet. King tested for memory of 
the experience of the shock on the feet by replacing them in the runway 



INTEHVAI BETWEEN EOOT .HOCK AND FCS 
(IN SECONDS 

Figure 4.14 

Effect of mtcrial between experwnce of shock on 
the feet and electroconvulsive shock (ECS) on the 
tendency to return to the water cumparttnent in 
which the foot shock was received. (Data drawn from 
Kinp, 1965. Copyri^xt 1965 by The American Psy- 
chological Association, Reproduced by permission of 
the author and ptiblisher.) 


and measuring how long it took them to return to the water compartment. 
In Figure 4.14 it is apparent that the animals who were subjected to 
ECS 75 seconds after the experience of the foot shock returned rather 
quickly, as if they did not remember being shocked. If the ECS was 
delayed longer, the animals took longer to return until, at an interval of 
3,600 seconds (one hour), the animals took an average of about 234 
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seconds to return. Sinc^ a group that was shocked but did not have ECS 
took about 238 seconds, \t is presumed that a delay of owv hour was 
sufficient, in this case, to permit ix)nsolidation of the memory of th<‘ 
experience. A sixth group that was given ECS. but was not shoc ked on 
the feet, took only 10 .seconds to return to the water compartment. 

A number of otlier proce(liii(‘s have also been empUned eilhca* to 
facilitate the trace or to disrupt it and prevent consolidation. iMir c'xainple, 
McGaugh, Westbrook, and Tliomson (1962) aiul Breen and McGaiigh 
(1961) report facilitation of k*arnmg through inj(*clion ot drugs. Tht' 
assumption is that the drugs produce, in some inaniuT, a inort' sigorous 
perseverative trace and thu.^' greatcT learning. (>n tin other liand. 
Pearlman, Sharple.ss, and Jarvik (1961 ) used other dings to disrupt the 
trace. Thc^y report that ether wa'- (‘Hective onlv il admiinsleied very 
quickly aftcT training Pentobarbital was somewhat more effective, 
giving some evideiue of trace disiuption when applied .is much as 10 
minutes after the learning. Pimts lenetelia/ol aiq'icared to disrupt the 
trace completely when adnunistei(*d within eight houis, and show(‘d 
some effect when administered lour days latc'i 

While the study by King and a iiumfier of others like* it s(‘em to have 
e.stablished that there is a pcTscverativc consolidation process that is 
necessary for the' storage of permanent mcanorv. maiiv psychologists feel 
that the eftccts of ECS and drugs c*an be t‘xpl<un(‘d in other WiA's. F^or 
example, Coons and Miller ( 1960) have* snggc'stcd that mo.st of the effects 
of ECS can be accounted for on the basis that /lCS ' constitutes an 
unpleasant expericnc'c and that the animal learns to avoid the* situation 
that led to it. Lewis and Maher ( 1965) revic'W tin’ litcTatiirc and c onclude 
that ECS produces an inhibition whicL becomes conditioned to the ECS 
situation and prevents perloiinance of the response, (dickman (1961), 
on the other hand, resiews the litc*rature and con(‘l;idi‘s that tlie over-all 
weight bf the evidence favors some irxvhanism of c'on.soli elation in spite 
of alternative interpretations of individual experimcaits. 

TRANSFER OF TRAINING 

For learnine to he useful to the* lc*am he must be able to transfer 
his learning from the training situation to other situations The problefn 
of transfer of training has given rise to three primary areas of laboratory 
research. The term stimulus generalization refers to the fac^ that a given 
response can be elicited to some degree by a range of similar stimuli. 
Response generalization is a term that applies to the fact that the same 
stimulus can be shown to produce a range of responses, either within a 
single trial or between trials depending on the nature of the situation. 
The concept of generalization is inseparable from the capacity of the 
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organism to discriminate between similar stimuli and responses. Thus 
the third basic concept of transfer of training is that of discrimination. 
Research on these three concepts will l)e discussed in turn. 

STIMULUS GENERALIZATION 

While the term applies to the capacity of a range of stimuli to 
produce unconditionecl as well as conditioned responses, demonstrations 
of stimulus generali/ation usually proc(‘ed by first establishing the capa- 
city of a stimulus to elicit the n'sponse in ((uestion through conditioning, 
and then testing the capacity ot other stimuli to produce some aspect of 
that response. A demonstration ot the generalization of a conditioned 
re.spon.se is shown in Figure 4 15. In two studies, Ilovland ( 1937a, b; 
Hull, 1943) measiiicd gcncMh/ation to differences in pitch and loudness. 



ind STLPS FROM CS 


Figure 4.15 

(li rural izaiiou g»rt(/iciifN for tonal frcijurncy and 
intensity, both nira.sutrd in ienm oj the number of 
just noticiahlr differences (jnd) in pitch or intensity 
from CS The response measured is the amplitude of 
the jialianic ^kin response (CSR). \The data an 
jinm tioiland, l^)37a Copyright 1937 by The ]out- 
ual Triss The draiiuie is adapted from Figures 42 
and 43 of Piiiuiples of Behavior by C L Hull. 

Copyru^lht 1943 by D Applet on-Century Co. Re- 
printed by permission of Appleton Crntury-C rofts ) 

He first paired a tone with shock in human subjects, and cond'tioned the 
galvanic skin response to the tone. Then he tested by presenting tones 
that differed from the CS by 25, 50, or 75 jnd's. { A frid is a "just notice- 
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able difference ' and is o^^tained by a psychophysical method that involves 
finding a tone that can just be distinguished as being diiFerent from a 
reference tone. It is assumed that puls arc psychologically c<jual. ) The 
figure is plotted to show that either increasing or decreasing the pitch 
produced a decrease in the amplitude of the GSR, the greater the change, 
the less CSff. In the case ij intensity generalization, the decreast* in GSR 
with differences in intensity from the original ('S was relatively small, 
even over a wider range of pul's in intensity. There is a iurther compli- 
cation in testing for intensity generalization. In the Hovland study, the 
more intense the test stimulus, quite independent of the generalization 
problem, the greater the GSR The curve plotted in Figure 4.15 is 
composed by combining subjc'cts who were test(‘d with mon* inteust* 
stimuli than the original CS with subjt‘cts who were tested with stimuli 
which are less intense Thus the curve is assumed to refiec t only the 
generalization effect. Th(' symmetrv of the two curves in th(‘ ligiiK* is 
produced by plotting the sanu* data Iwur, once in ea(‘h diieetion from 
the training stimulus. Thus it represents a combination of (‘iiunncal fact 
and assumpti(ms concerning the nature of gi^neralization. 

This mixture of reasonalilc fiction and actual fact aristas from both 
logical and t(*c‘hnical difficulties in jiroducmg an unambiguous demon- 
stration of stimulus geiKTali/ation. Logically, no two siimuli i‘an ev(‘r lx* 
ab.solutely identical. Therefore there is .souk* piobU'in involved in 
explaining how learning takes place at all, since (‘acb pr(*s(*ntation of a 
stimulus involves a “new stimulus.” To say that stimuli an* “similar” is 
of little help until the nature of similarity is specified. Technically, thr* 
demonstration of generalization is difl^ciilt because great lime and ^-ffort 
are usually expended in establishing the rt*sponse. At tlx* first t(*st with a 
“similar” stimulus, a response' can be m<*asuied, ])ut no furtlx*r uiK'orn- 
plicated tests can be made On the first test, the r<*sponse must be (*ilh(*r 
remfoicM, thus producing h'arnmg in tlx* piesence of that stimulus, or 
not reinforced, thus producing extinction to that stimulus. Tlx* incffici(*ncv 
of extensive tiaining followed by a single t(‘st, along with tlx* ix'ce ssity 
for a \ery large number of tests to establish the precise form of »i gen- 
eralization curve, has so far prohibited an unambiguous empirical 
solution. 

What one ii.eaiis by “similarity” cons.ilut(*s a problem One possi- 
bility is to regard stimuli as being similar or equivalent to the extent to 
which they produce the same response. This is the “stimulus (*quivalence” 
position. Thus one cannot presume to predict generalization, one can only 
try stimuli and classify stimuli according to whether some degree of 
the re.sponse occuis. This position not only does not allow the prediction 
of generalization, but does not permit one to plot a generalization 
gradient, for there is no scale against which the amount of the response 
can be plotted. To plot a gradient of stimulus generalization, it is 
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necessary to make some assumption about tV similarity of stimuli. The 
two most common “independent*’ definitions of similarity involve the use 
of cither a physical scale or a psychological scale as the dimensions in 
terms of which the stimuli are arranged. Thus Hovland might have used 
the physical frequency of the tone and plotted the magnitude of the 
GSR response against vibrations per second. Problems arise, however, 
from thfs solution. Bltj,ckwell and Schlosberg ( 1943 ) demonstrated that 
the amount of generalization to a tone one octave away from the training 
tone produced a greater tepdency to respond than did tones on either side 
of it. Thus, with a training tone of 10,000 cps, rats resqwnded as if a tone 
of 5,000 cps was more similar than tones of 8,000 or 7,000 cps Further- 
more*, generalization may oc'cur bc‘tween stimuli for which there is no 
underlying phy.sical dimension In many learning situations, for example, 
one miglit wish to predict the degre e of generalization from one stimulus 
to another wheie the stimuli do not share a relevant physical dimension. 
In such cases, generalization must be measured along some psychological 
dinitMision, such as similarity in meaning or similarity in sound, that must 
be measuH'd indepcMidently from the response under study. Even this 
.solution poses diffic'ulties. Guttnian and Kalish ( 1956 ) trained pigeons to 
peck at a key illuminated by a color produced by a relatively narrow 
band of wavehmgths and then tested for generalization to other wave- 
lengthJ. When the generalization curve obtained in this way was com- 
pared to a curve showing the discriminability of the various wavelengths, 
it was concluded that the pigeons were generalizing in response to the 
physical properties of the stimulus {such as frequency) rather than the 
psychological properties (such as ;nd*s) The implication of this conclu- 
sion has been challenged, however, bv Shepard (1965), who derived the 
Guttmaii and Kalish generalization gradient from the discrimination 
function, thus implying that the generalization function and the discrim- 
ination function had a common origin. .Furthennore, Kalish (1958) 
carried out a similar study with human subjects and found generalization 
to conform to tlie psychological scale rather than the physical one. Thus 
it is not necessarily true that a scale of “similarity” obtained with one 
set of measuring techniques will define the dimension the organism 
responds to in another. 

^ Explanations of generalization take the form of specifying what it 
is that is c'ommon to the generalized response and the scale of similarity 
against which it is plotted (such as the jnd scale in Figure 4.15). The 
three most common explanations are plirased in terms of ( 1 ) an under- 
lying psychological scale, (2) common stimulus elements, or (3) medi- 
ation by some common association. In Hovland’s study, it was assumed 
that the independent measurement of difference thresholds of pitch 
reflected an underlying dimension of similarity in pitch, and that the 



EXPERIMENTAL AND THEORETICAL ISSUES 


65 


GSR response reflected this same scale. According to the second explana- 
tion, stimuli are composed of large numbers of .small, unitary elements. 
The CS is composed of one large set of elements, and the amount of 
generalization is thought to be detennined by the proportion of this set 
of elements that occurs in a test stimulus. Explanation in terms of media- 
tion suggests that the connection l>etween stimulus and response is not 
direct, but is rather mediated by some response with which both have 
been associated. All three explanations are theoretii'al and define 
‘similarity*’ in terms of an underlying property that is assumed to be 
common to both sets of measurements (e.g . jruTs and (^SR) 

The complexities of the problems of stimulus generalization have 
produced a great many experiments and theoretical arguments. An 
excellent .summary of the exp<‘riments and the theoretic-al issues under- 
lying them may be found in Kimble ( 1961 ). 

RESPONSE GENERALIZATION 

Similarity between resp<mses poses an even rnori* difficult problem 
than that of similarity between stimuli. It is clear that wh(*n one response 
is conditioned to a stimulus, other responses arc* associated to some 
degree— but a problem arises in trying to establish a diim^nsion of 
re.sponse similarity. No simple .solution to the problem has been cfevel- 
oped Possibly the most useful idea is Hull’s (1943) concept of hahit- 
familtj hierarchy. He conceived the organism as being born with a family 
of respon.ses that are more likely than oth(Ts to (X'cur and to produce 
satisfaction when a biological need arisfs. He thought of these responses 
as being arranged in a hierarchy in terms of their probability of occur- 
rence. If one analyzes the behavior of a newoorn pup, for example, the 
.secpience of responses that lead to nursing for the first time can lx 
thought of as being composed of a number of distinguishable responses 
involving movements of the forelegs, hind legs, body, and mouth. They 
vary in likelihood of occurrence, but they are not random. When a pup 
first nurses succ'essfully, according to Hull, the original hierarchy of 
responses is reordered in terms of their probability of occurrence— with 
those responses that led to reinforcement increasing in probability and 
those that did not lead to reinforcement decreasing in probability. TTius - 
in any given situation, the respon.ses that will occur can be ordered in 
terms of their probability of occurrence, the ordering described as a 
hierarchy, and the heirarchy rearranged on the basis of the reinforcement 
experience. This conception of response generalization is very similar to 
the stimulus-equivalence conception of stimulus generalization, in that 
no underlying “dimension of similarity*’ is identified with either. 
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DISCRIMINATION 

Generalization implies discrimination. A diflFercnce in respon.se must 
be based on a pel-coivcd difference in stimulus. Sometimes generalization 
is described as a failure to discriminate. However, the human subjects in 
Hovland*s experiment shown in F’igure 4.15 were certainly able to 
discriminate between the different pitches, even though they responded 
to pilches different from the training pitch. Therefore, the subjects could 
discriminate at one level (by detecting just noticeable differences in 
pitch) and failed to discriminate at another level (by giving a GSR 
to a pitch other than the training pitch). 

In discrimination learning, one .stimulus is usually followed by 
reward and another stimulus followed by nonreward. For example, a rat 
might be rewarded for choosing a dark gray alley and not rewarded for 
choosing* an alley painted a lighter shacle of gray. Under these circum- 
stances, the animal will show learning by coming to choose the dark 
alley most, if not all, of the time. 

There are two prominent approaches to the explanation of discrimi- 
nation learning. In one, the tendency to respond to the dark gray is ex- 
plained by the accumulation of habit strength through reinforcement, 
while the tendency to respond to the light gray stimulus is said to be 
extinguished through nonreinforcement. Both tendencies are considered 
to be generalized, but the performance comes to a high level when the 
difference in habit strength to the light and the dark stimuli is sufficiently 
great that discrimination is established. 

The other prominent explanation that an animal in n discrimination 
situation learns a relation between the stimuli, and, in the example, 
comes to choose the darker of the pair. This leads to the expectation that 
an animal trained in the manner described will choose the darker of a 
new pail of stimuli, thus transposing the relationship he has leained from 
the original stimuli. If the animal is asked to choose between the dark 
stimulus that has previously led to rew'ard and a very much darker 
stimulus, he will be expected to respond to the relation of the stimuli 
•rather than to their absolute properties, and to choose the very much 
darker stimulus. This result has been observed and is called the trans- 
position phenomenon. In spite of the apparent contradiction between 
the.se results and an explanation in terms of habit tendencies with 
respect to absolute properties of the stimuli, an explanation can be and 
has been developed by Spence (1937). For an account of the theoretical 
issues involved in explanations of transposition, and a review of the 
relevant experimental studies, see Kimble (1961, pp. 37811.). 
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ELIMINATION OF lEARNED RESPONSES 

EXPERIMENTAL EXTINCTION 

If, after a CR is established, a CS is repeatedly presented alone 
and is not followed by the VS, the usual result is a gradual reduction 
in the amplitude of the CR. The term experimental extinction was 
originally used to refer to the procedure of omitting the (^S and thus 
presenting the CS alone. However, in general usage, the term has grad- 
ually come to refer to the disappearance of th(‘ ('H tlirough a variety of 
procedmes other than the omission of the t'S In selective learning 
through reinforcement, experimental extinction refers to the procedure 
of omitting the reward and also to the disappearanct* ol the response. 
Thus the term “experimental extinction” is used both as a name for 
procedures and as a name for the results of tluxse procc'dures. 

The course of experimental extinction of a classic.d conditioned 
response may be .sc'en in Figure 4.16. The.se data are taken from a studv 
by Hovland. The GSR response had been conditioned to a tone by pairing 
the tone with a shock to the wrists of human subjt*(ts. The figure is a plot 
of the magnitude of the GSR as a peicenlagc oi the amplitude on the 
first extinction trial. The respon.se on this first trial occurs, of '^ourse, 
before the subject has experienced the first omission of the shock and 
therefore represents the amplitude at the end of training. The decrease 
in the amplitude of response occurs quite abruptly in this experiment 
and appears to reflect the disappearance of the effects of training after 
as few as five trials of experimental extinction Very similar curves can 
be obtained in selective learning when, after a period of training, the 
reward is omitted. 

The,re are a number of empirical phenomena associated with the 
process of experimental extinction that make quite clear that while 
experimental extinction as a procedure is quite simple, it is quite c'omplex 
as a process. 

AMOUNT OF TRAINING 

It is generally true that the greater the amount of training, the 
stronger the habit, and the greater the resistance to extinction. This 
principle is so widely accepted that the number of trials to extingui.sh 
a respon.se is taken as the measure of the strength of the habit. For 
example, in Figure 4.2 the number of trials to extinction is used as a 
measure of the amount of habit strength developed through different 
numbers in reinforcements (Harris and Nygaard, 1961). Furthermore, 
some degree of “overtraining*’ produced by continuing reinforcement 
beyond the point at which the animals meet the criterion of learning 
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cxpfrimenial fxtinction trials 


Figure 4.16 

The courfic of vxperimrntol extinction of a condi- 
tioned CSR response in human subjects The ampli- 
tude of the CSR on the first trial of experimental 
extinction is taken as 100 percent, and the ampli- 
tudes of subsequent responses are expressed as 
percentages of that amount (The figure is re plotted 
from Figure 57 of Principles of Behavior by C. L 
Hull, ('opyrighi 1943 by D Appleton-Centunj Co. 
Reprinted by permission of Apple ton- Century - 
Crofts ) 


produces an increase’ in resistance to c'xtinction. Mackintosh (1963), 
among others, reports such a result However, the opposite result may 
be obtained if the problem is simple and ?he overtraining extended. 
Ppr example, Ison ( 1962 ) trained hungry rats in an extremely simple 
problem, running down a straight alley to obtain food The problem is 
so simple that animals learn to run at near maximum speed in a very 
f(?\v trials— less than the 10 trials Ison used as a minimum. He trained 
six groups which had 10, 20, 40, 60, 80, or 100 reinforced trials before 
experimental extinction was instituted by removal of the reward. He used 
several different criteria for extinction, but the most dramatic results 
appeared with a criterion that the animal not enter the box in 120 
seconds. These results are plotted in Figure 4.17, where it is clear that 
the greater the number of reinforcements, the faster extinction occurred. 
These results are consistent with the results shown in Figures 4.3 and 4.4 
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NUMBER OF REINFORCEMENTS BEFORE EXTINCTION 

Figure 4.17 

Dccrca^sc in resistaiu'c to extinction with overlearn- 
ing Ison (1962) f^ate varying numbers of reinfora d 
res^nmscs in a simple runway before beffinninff 
extinction (Adapted from Ison, 1962 Copyrifiht 
1962 by The American Psyi holotl^cal Association 
and reproduced by permission.) 


which indicate that response strength may rise and fall with continued 
reinforcement. 

FORGETTING 

There is general agreement that most of the responses conditioned 
in the laboratory are not forgotten, even over long periods of time. The 
author once conditioned leg flexion in a hunting dog and then gave the 
dog to a farmer Two years later the dog was tested and showed no 
forgetting of the response. Kimble (1%I) lists a number of reports 
showing little if any forgetting of a variety of responses in a number of 
different organisms. 

DlSINHIBmON 

If an extraneous stimulus is introduced during the experimental 
extinction procedure, the result is frequently one of an increase in the 
response on that trial. Pavlov (1927) thought of the process of experi- 
mental extinction as being an inhibition of the response, and thought of 
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tlie effect of the extraneous stimulus as being o/ie of inhibition of the 
inhibition, and thus the production of a greater response. Figure 4.18 
shows the effects of extraneous stimuli on an extinguished salivary 



Extinction and disinhibifion of a conditioned sahiary 
response in a do^. Salivary response was trained to 
the sight of meat powder, and the response was meas- 
ured in terms of the drops of saliva from two glands, 
the submaxillary and the parotid. After extinction, 
the response reappeared with tactile stimulation, a 
knock on the table, and after Professor Pavlov had 
entered the room and talked between trials. (Figure 
is redrawn by 1. P. Pavlov, Conditioned Reflexes, 
1927 [Clarendon Press, Oxford], by permission.) 


response. Zavadsky, a student of Pavlov s, conditioned a dog to salivate 
using the sight and smell of the food powder as the CS, and ingestion 
of the food as the VS. Then he extinguished the response quickly by 
presenting the sight of the meat powder at a distance for one minute 
periods every five minutes. The response was measured in teims of the 
number of drops of saliva secreted from each of two fistulated salivary 
glands, the submaxillary and the parotid. After the response in both 
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glands had extinguished, disinhibition was produced first by applying 
tactile stimulation along with the sight of the meat powder and then 
by knocking on the table when the powder was presented. The extra- 
neous stimuli produced salivation in both glands. Affer one more trial 
with the CS alone, which did not elicit a response. Professor Pavlov en- 
tered the room and talked for two minutes. After he did, the meat powder 
was again presented at a distance and appreciable salivation occurred. 

SPONTANEOUS RECOVERY 

An extinguished conditioned response sometimes reappears spon- 
taneou.sly after a lapse of time. Figure 4.19 shows spontaneous recovery 
of a salivary response in one of Pavlov's dogs (1927) after a lapse of 



Figure 4.19 

Spontaneous recovery of a conuttioned salivary re- 
sponse The sight of meat powder without ingestion 
extinguished within six trials After a two hour 
period, the sight of meat powder again elicited a stg- 
n.ficant salivary response. (Figure is redrawn from 
I P. Pailov, Conditioned Reflexes, 1927 [Clarendon 
Press, Oxford], by permission ) 


two hours in which the dog had been left alone. While the response in 
the figure recovered only about 15 percent of the original strength, much 
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larger amounts of spontaneous recovery are not uncommon. Ellson 
(1938), for example, varied the amount of time between the end of 
experimental extinction and the test for spontaneous recovery of a lever- 
pressing response* in rats and found a steadily increasing amount of 
spontaneous recovery the longer the interval. After about three hours, 
the response recovered to slightly less than 50 percent of the original 
response, strength. If the extinction procedure is continued for a sufficient 
number of trials aftei' the response has disappeared, no spontaneous 
recovery will occur. This procedure is sometimes referred to as “below- 
zcro” extinction. 

AMOUNT OF EFFORT 

The number of trials to extinguish the response can be shown to 
decrease as the amount of effort required to make the response is 
increased. For example, Applezweig ( 1951 ) varied the number of grams 
pressure required for rats to press a lever during both learning and 
extinction. He found that the response strength increased with ♦he 
amount of effort required during training, but decreased with the amount 
of weight during extinction. 

PUNISHMENT FOR RESPONDING 

If ♦ punishment for responding is combined with omission of the 
US or omission of the reward, the usual result is rapid disappearance of 
the response. For example, Seligman and Campbell (1965) trained rats 
to avoid shock in a runway. During extinction the animals were pun- 
ished for running by receiving shock as they entered the goal box. Sev- 
eral intensities and several durations of shock in the goal box were used. 
They found just what one might expect, that the more intense or the 
longer the punishment for running into the goal box, the slower the 
animals ran and the quicker they stopped running altogether. However, 
punishment can produce enhancement of the response under some 
conditions, as will be noted later. 

CONDITIONS PRODUCING GREAT RESISTANCE 

TO EXTINCTION 

There are several training procedures which produce very persistent 
and almost intractable responding. Avoidance-conditioning procedures 
often produce learning and high resistance to extinction in a very few 
trials. For example, in a standard procedure used with dogs, a tone might 
be used as a CS, and the response elicited by a shock on the forepaw. 
If the apparatus is designed to permit the dog to avoid the .shock alto- 
gether by lifting his paw before the shock appears, avoidance may be 
established in as few as two or three trials. It may then persist beyond 
the patience of the experimenter ( possibly 500 trials ) without noticeable 
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evidence of extinction* e^en though the • shock is no longer given no 
matter what the animal does. Punishment of an avoidance response, 
under certain conditions, also leads to persistence df responding. The 
contrast between this effect of punishment and the more usual effect of 
hastening extinction is referred to as the punishment paradox. This 
problem is discussed on pige 80 in a general treatment of the rt^inforcing 
effects of punishment. High resistance to extinction can l)e produced by 
partial or intermittent reinforcement in contrast to 100 percent rein- 
forcement of the response. (This problem is discussed in the next 
chapter. Operant Conditioning.) 

EXPLANATIONS OF EXPERIMENTAL EXTINCTION 

There is a wide range of explanations that could be and have been 
offered for the gradual disappearance of the response when cither the VS 
is omitted or the reward no longer follows the response 

DECREASE IN HABIT STRENGTH 

If habit strength rises during reinforcement, it is a logical possibility 
that it falls under nonreinforcement. This possibility holds little jppeal, 
for habit strength is defined as that component of the determiners of 
performance that is semipermanent and relatively immutable. 

DECREASE IN INCENTIVE 

The most obvious difference between acfjiiisition and extinction is 
the presence during the acquisition of the VS or the reward, and the 
absence of one or the other during extinction. If cessation of the VS in 
conditiojaing is regarded as an incentive, as is the reward in selective 
learning, then extinction can be explained in terms of a decrease in the 
incentive component or term in the equation 

X H X 1. 

This explanation requires that with D constant during ac(|uisition, both 
H and I incieasf^ with increased training, during extinction, performance 
decreases because the value of 7 decreases. 

CHANGES IN DRIVE 

It seems likely that D is not constant between acqusilion and 
extinction conditions. If a noxious i^S is used during the acquisition 
phase, then its omission during extinction should produce a lower D 
value, possibly even a zero value of D Unfortunately, avoidance C‘ondi- 
tioning produces high resistance to extinction, even though the US is 
avoided, and by this logic, the response should disappear. Omission of 
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the reward, on the other hand, should tlieoietiially increase the value 
of D. When the organism is rewarded, the reward is usually conceived 
as reducing the d«rive which is instigating the response. Omission of the 
reward during extinction should produce a sustained D value in place 
of reduction in D. Also, the removal of reward might be regarded as 
frustrating. Frustration might be regarded 'as having aversive drive 
properties and thus might produce an increase in D during extinction 
as compared to acquisition. (This possibility has been dealt with by 
Amsel and Roussel, 1952, artd Amsel and Ward, 1954 ) . Therefore, changes 
in the value of the D term in the equation might explain some of the 
special findings in studies of experimental extinction, but the changes do 
not appear to offer a plausible account of the disappearance of the 
response under simple experimental extinction conditions. 

EXTINCTION AS THE LEARNING OF A 

DIFFERENT RESPONSE 

In most learning situations, the growing strength of the measured 
response is accompanied by the disappearance of other, usually unmea- 
sured, responses. During experimental extinction, these unmeasured 
responses are frequently observed to reappear. For example, in a simple 
runway task, the animal may be observed to explore his environment 
and to engage in preening and grooming behavior as well as to move 
eventually from the starting V^ox to the goal box. During acquisition, 
these irrelevant responses disappear. In fact, most of the increase in 
speed of running occurs as a function of the disappearance of these 
irrelevant, time-consuming, and incompatible responses. During extinc- 
tion they may reappear. The equation used above to describe the factors 
which determine performance refers only to the measured response. 
Consideration of the role of incompatible responses requires that per- 
formance be determined by both an expression of the value of the 
positive-response tendency and one expressing the strength of the incom- 
patible response. Thus, if we use p as a notation for the positive response 
and i as a notation of the incompatible response, we can write an expres- 
sion for performance as a function of both responses: 

P, = (D,xH,X/p)- (D,xH.Xf.). 

A great many problems arise in attempting to explain experimental 
extinction in this way. There is some reason to believe that all habits 
are influenced by all drives present, and the expression above indicates 
that the two drives are as incompatible as the two habits. Whether the 
simple algebraic form of the expression is justified or wheth*»r different, 
and possibly more complicated, relationships are involved is also a 
problem. Among the many other problems is that of the characterization 
of the incompatible response. 
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One characterization* of the incompatible response is that of Hull 
(1943). He defined not responding as a response. He pointed out that 
the effort involved in making the response had aversive properties. He 
described the aftereffects of effort as a negative drive called reactive 
inhibition^ which was a drive against making the response again. He 
thought of the dissipation of reactive inhibition as being reinforcing, 
and it reinforced the tendency not to respond. Xhe habit that* was built 
up in this way he called conditioned inhibition, which amounts to a 
learned tendency not to make the response.* 

The incompatible response has also been conceived of as a positive 
response that is simply physically incompatible with the response being 
measured. For example, it is a common observation in conditioning 
studies that the initial presentations of the CS elicit orienting responses 
followed by the overt response to the t^S. As conditioning proceeds, the 
orienting response tends to disappear as the t iMidilioned res{x>nse grows 
in strength. During extinction, the orienting response may reappear. 
Berlyne (1960) reviews much of the work on the orienting response, 
much of which was done in Russia. In this work, the orienting response 
is often regarded as the overt manifestation of attention. 

Whatever the explanation of normal extinction, conditioned re- 
sponses can be eliminated through the positive conditioning of^ incom- 
patible responses. The process is called counter conditioning and forms 
the theoretical and procedural basis for some forms of behavioral therapy 
in clinical psychology. 

CHANGES IN THE STIMULUS PATTflRN BETWEEN 

ACQUISITION AND EXTINCTION 

When the US is omitted or the reward removed during extinction, 
not only is the source of reinforcement removed, but important changes 
are made in the stimulus pattern between the conditions of acquisition 
and the Conditions under which the re.sponse is extinguished. This fact has 
led to an explanation in terms of generalization. The response during 
extinction is regarded as a generalized response of lesser strength than 
the original. Efforts to manipulate the amount of stimulus change 
between ac^quisition and extinction must be evaluated in the light of the 
effects of stimulus variation on acquisition. Thus, for example, Vgj^ks 
(1954) carried out an eyelid-conditioning study in which she exercised 
unusual control over stimulus variation. She had the subjects keep their 
eyes closed between trials and open them on signal, hold their breaths 
during the trial, and initiate the CS themselves by pressing two keys. 
All of these operations were designed to make the stimulus pattern as 
nearly identical as possible during the acquisition trials. Under these 
conditions, learning was very rapid, and about half of her subjects 
showed conditioning in a single trial. Walker (1948) trained two groups 
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of rats to press a lever and then extinguished the response with one 
group confined to a small area before the bar and the other free in the 
larger compartment. The animals in the smaller area showed higher 
resistance to extinction. These two studies seem to justify the conclusion 
that under conditions of minimal stimulus change within acquisition 
and extinction and between the two phases of training, learning is faster 
and extindticn is slower,, 

Hulicka, Capehart, and Viney {I960) deliberately manipulated the 
amount of change in the 'stimulus pattern between acquisition and 
extinction and found that the greater the change, the faster the extinction. 
Five different sets of stimuli accompanied the reinforcement during 
acquisition. They were the click of the food release mechanism, a light 
that flashed when the lever was pressed, a continuous red light near the 
rear of the apparatus, a continuous buzzer, and a false floor to cover the 
grid. Maximum responding during extinction occurred when only the 
reward was removed. Generally, the larger the number of stimuli that 
were removed along with the reward during extinction, the fewer were 
the responses that occurred during extinction. It therefore seems likely 
that the greater the difference in stimulus pattern between acquisition 
and extinction, the faster the extinction. It remains to be demonstrated, 
howevei, that all of the phenomena of extinction can be accounted for 
on the basis of change in stimulus pattern. 


THE NATURE OF REINFORCEMENT 


(Contiguity is the oldest principle of association, reinforcement is 
the newest. Aristotle recognized that association occurred because of 
temporal and spatial contiguity. Reinforcement, as a principle of asso- 
ciation, is a development of the twentieth century. 

A reinforcer may be either positive or negative. A positive rein forcer 
may be identified as any state an organism will undertake to approach 
or achieve. A negative reinforccr may be defined as any state an organism 
will undertake to reduce or to avoid. These definitions describe only the 
inc^,ntive function of reinforcers They define reinforcement in terms of 
approach and avoidance and say nothing about the relation of rein- 
forcement to association or learning. 

Reinforcers may also be defined in terms of their effects on asso- 
ciational connections Thus a positive reinforcer is sometimes defined as 
one which produces a strengthening of a stimulus-response connection 
or one which increases the probability of occurrence of the response. A 
negative reinforcer may similarly be defined as one which weakens a 
stimulus-response connection or decreases the probability of a response. 
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While the incentive prd^rties of reinforcers are denied by no one, the 
issue of whether reinforcers operate to strengthen associational connec- 
tions is an issue of strenuous debate that is difficullP to resolve. That 
negative reinforcers operate to weaken connections is a j^roposition that 
is widely doubted. 

The terms positive ruinforcement and negative reinforcement are 
closelv associated with reward and punisinnent. tThi'y c.innot be ust'd 
interchangeably with the latter terms, however. Uc'w.ird and pinushment 
are general terms that can bo used by anv leaVning tlieorist regardless ol 
his position. Thus one can be a strict behaviorist md asseil llial con- 
tiguity is the nece.ssary and sufficient (‘ondition of leaiinng and still talk 
about the effects of reward and punishmi'ut. The use of tin' woid lein- 
forcement implies incentive properties as a minimum, and an inflmaice 
on as.sociative strengtli as an additional option. 

In using the terms, ino.st ri'inforcemcnt theorists .issuine that positive 
and lU'gativc rtanforcers are inextricaldv associated. A drive state, such 
as hunger, is legaided as a negative reinforcrr and th(* food reward that 
relieves the state is a po.sitivt* reinfouti A noxious stimulus, such as 
shock, is regarded a.s a negative reinforccr, escape from the noxious 
stimulus is a positive leinforcei An unpleasant state, such as tear, is a 
negative reinforciT, a reduction in fear is a positive^ uMuforccT. 
reinforcement theorists, however, the ttTin ncf^ativc reinforce merit is 
rarely used. Instead, the t<*nd('ncy is to use the t(*rrns drive or aversive 
stimulus to imply a role in the instigation of action and occasionally the 
term negative incentive to einphasi/e the teiulency to avoid such a 
.stimulus. Neither of these meanings •implies a weakening of an asso 
ciative connc'ction. P\>r this reason, when the ti'rm reinforcement is u.sed 
alone, as it usually is, it ref(*rs to positive reinfoicc*ment. 

REINFORCEMENT AS REDUCTION OF A 
BIOLOGICAL NEED 

Considerable research has been ch voted to an attempt to isolate 
the reinforcing elemcmt of the situation which a biological need is 
reduced. A hungry animal sees food, consum<‘s it, the empty stomii^ 
becomes full, and there arc then changes in the chemistry of the blood- 
stream. Which of these phases constitute reinforcement? 

One appr lach to this problem involves the. placement of a fistula in 
a dog's esophagus in .such a manner that either food that is swallowed 
can be prevented from reaching the stomach or food can be introduced 
directly into the stomach, bypassing the dog’s mouth. Two studies, one by 
Kohn (1951) and one by Berkun, Ke.s.sen, and Miller (1952;, are nearly 
identical and yield the same conclusions. In the Kohn .study, animals 
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were trained to push a panel to obtain licpiid' food. They were then 
prefed in three different ways. In one treatment, 14 cc of milk were taken 
by mouth but no4 permitted to reach the stomach. In another condition, 
14 cc of milk were placed directly into the stomach. In a third condition, 
14 cc of saline solution were placed directly in the stomach. Kohn then 
measured the rate of panel pushing for foodrto determine the effects of 
the prefee Jiiig. After .the injection of saline solution in the stomach, the 
rate of panel pushing for liquid food was about 13 pushes a minute, but 
after milk had been injected in the stomach, the rate was a little over 
seven responses a minute. Tliese results seem to mean that even when 
the consummatory response had been bypassed, some reinforcement had 
occurred. However, when prefeeding consisted of the animal c*onsuming 
milk which did not reach the stomach, the rate of panel pushing was 
only a little over 5.5 responses a minute. This result seems to indicate 
that the consumption of the milk had been reinforcing even though the 
biological need had not been reduced. The Berkun, Kessen, and Miller 
study yielded almost identical results when the amount of food, rather 
than the rate of instrumental responding, uas taken as the measure. 

Miller and Kessen (1952) extended this fislulation and feeding 
procedure to a T-maze learning situation. They wished to determine if 
the aqimals could discriminate between the various treatments and which 
treatment would be regarded by the animal as preferable or most rein- 
forcing. If an animal made an incorrect choice, it received 14 cc of saline 
solution directly in the stomach. Three different rewards were offered 
different groups. Animals that received milk in the mouth, even though it 
did not reach the stomach, learned to choose the milk side quickly. Thus 
milk in the mouth was reinforcing. However, injecting milk in the stomach 
takes considerable time and a direct comparison between milk in the 
mouth and milk in the stomach would be complicated by differential 
delay of reward. Therefore, they ran a group that received milk in the 
stomach for a correct choice and a group that received milk in the 
mouth seven minutes and 35 seconds after the choice. Both groups were 
able to discriminate and learned the problem. The group receiving 
delayed milk in the mouth performed slightly better than the group 
receiving milk in the stomach. Thus both the consummatory process 
tflone and the reduction of the biological need without consumption 
appear to be reinforcing. 

Both the consummatory process and food in the stomach were 
bypassed by Coppock and Chambers (1954) and Chambers (1956a, b). 
They injected glucose directly into the bloodstream in rabbits as rein- 
forcement for head turning. The animals showed a preference for glucose 
rather than xylose injected into the bloodstream, and the preference was 
greater when they were hungry. Thus, this study taken with the previous 
studies appears to establish that direct injection, stomach loading, and 
food consumption are all independently reinforcing. ^ 
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SheflReld and Rob/ (1950) demonstrated that animals would learn 
to choose the correct side of a T maze when the reward was sweet tasting 
but nonnutritive saccharine. Sheffield. VViilff, and Bacber (1951) demon- 
strated that male rats would learn to run a straight alley to reach a 
female even though copulation was interrupted lx*fore ejaculation. Both 
studies can be interpreted ^is demonstrating that a stimulus im reasr (‘an 
serve as a reinforcement in contradiction to the rv(|uirem(’nt ol’a r(*(lu(' 
tion in either stimulation or need. 

A great variety of other stimuli, objects,* and conditions have been 
shown to be reinforcing that cannot easily be related to biological needs. 
Animals will perform a great variety of responses lor a reward that (‘on- 
sists of a pulse of electrical current to some portions of the brain tissue 
delivered through implanted clectrod<*s (Olds and Milner, 1954, Olds, 
1956). These studies are discussed m more detail in Butler (pending). 
Harlow (1950) and Harlow', HarUnv, and Meyers (1950) have shown 
that monkeys will learn to unlock a hasp on the cage with no (‘xtraiu'ous 
motivation or reward. Butler (1954) has demonstrated that an isolated 
monkey will work hard for no other uw^ard than the chance to explore 
a visual environment through a window', and will learn a discrimination 
for visual exploration (1955) and auditory stimulation (1957). Mont 
gomcry (1954) showed that rats would h'arn to choose an ariji of a 
single-choice-point maze to attain an opportunity to explore a complex 
environment in preference to a simple one. Kish and Barn(‘s ( 1961 ) 
showed that mice prefer a movable lever to an unmovable one. Miles 
( 1958 ) found that kittens would learn a Y ma/c to ri'ach a toy or a room 
to explore almost as fast as they woJid learn it for food wheri hungry. 
Pubols (1962) offered animals a choice between a fixed or a variable 
delay of reward and found a preference for the variabhi delay. Studies 
by Kislj (1955), Forgays and Levin (1958, 1959), Robinson (1961), 
and Levin and Forgays (1960) show that under appropriate conditions 
mice will work to turn a light on or off. These studies suggest that stimulus 
change is reinforcing. 

EXPLANATIONS OF REINFORCEMENT 

A basic explanation of the nature of reinforcement is that it ifr a 
reduction of a biological need. This need-reduction theory is generally 
identified with Hull (1943, 1952), who specified that the reduction in a 
stimulus that is characteristic of a biological need is what actually (X)n- 
stitutes reinforcement. This position can be retained, even in the light 
of the above evidence in the foregoing paragraph, if one appeals to 
secondary reinforcement or acquired reward value. Thus, all of the 
stimuli closely associated with actual need reduction can acquire rein- 
forcing properties, and instances of reinforcement without reduction in a 
biological need can be accounted for in terms of the presence of the- 
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secondary reinforcers. If reinforcement has no obvious connection with 
hunger, thirst, fear, or sexual needs, then additional drives or need^ can 
be postulated. Thus, it is possible to talk of a need to explore or a need 
for stimulus change. In this vein, it can be shown that the longer 
Butler’s monkeys are isolated, within limits, the harder they will work to 
earn a chance to look at electric trains or othtir monkeys. Likewise, Kish 
and Baron ( 1962 ) showed that the lever-pressing behavior of mice to 
turn lights on and off could be controlled to a degree by the character 
of a pre-exposure period, dark or light, changing or unchanging. 

An alternative to a need-reduction theory of reinforcement is a 
stimulus-reduction theory. Miller ( 1951 ) argues that any strong stimulus 
is drive producing and that the reduction in intensity of any strong 
stimulus is reinforcing. Thus any stimulus may be a drive and a rein- 
forcer— this role is not limited to “stimuli characteristic of a need." 
Miller's argument does not readily explain studies in which a stimulus 
increase is found to be leinforcing. 

A common position concerning the nature of reinforcement is that 
no explanation is necessary. Thus Skinner (1938, 1948, 1953) says that 
any stimulus that increases the probability of response is a reinforcer by 
definition. He argues that this treatment is not circular. If a stimulus is 
identiSed as a rcinforcer in one situation, it can then be used as a rein- 
forcer in another situation, a condition that makes the definition general 
rather than circular. This position does not provide for an independent 
determination of the status of a stimulus as a reinforcer, and thus effec- 
tively eliminates the possibility of testing the question of whether 
reinforcement is necessary for learning. That proposition is accepted as 
a primitive axiom and is not to be (juestioned. 

PUNISHMENT AND REINFORCEMENT 

The role of punishment in learning is by no means simple, and the 
application of punishment to a response may have unpredictable and 
varied con.sequences. In fact, one of the basic unsolved problems in 
learning might be called the* punishment paradox The application of 
puni.shment to a respon.se usually leads to its prompt disappearance. The 
refaler should recall two studies discussed earlier. In one, Bc^wer, Fowler, 
and Trapold ( 1959 ) demou-itrated that the greater the amount of reduc- 
tion in shock provided when the rat reached a goal box, the faster it 
ran. The results of this study are shown in Figure 4.7. In treating the 
effects of punishment of the response on extinction, the study by Seligman 
and Campbell (1965) w-as described in which shocking the "nimals for 
running to a goal box produced faster extinction with greater punishment 
for responding. These results fulfill what might be called the “normally" 
(‘Xpected effects of punishment. However, there are situations in which 
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the application of punishment leads to repetition of the response or 
even performance of the response in a manner suggesting that punish- 
ment has produced greater learning. Brown, Martin, and Morrow { 1964) 
performed a study in w^hich rats were trained to escape^hoek by running 
to a safe goal box. During extinction, the start box was safe lor all, but 
some animals were shocked in part or all of the alley. In one experiment, 
animals shocked in the ruAway did not extinguish faster than animals 
that were not shocked, and in another experiment, animals that were 
shocked for making the response extinguished even more slowly. Tlu‘ 
authors refer to this performance as “masochistic-like" behavior. 

A dramatic example of punishment leading to r('p(*lition of the 
response is any one ot a set of experiments by Maier (1949) and his 
students. Typically, an animal is induced to jump from a small platform 
toward one of two windows vvhich contain stimulus cards. Animals can 
be taught to discriminatt* between the two cards if jumping to the 
correct one leads to access to the platform and food behind tlu‘ card 
and jumping to the incorrect one leads to a bump on the* nose and a fall 
into a net several feet below. In one study by I'dlen and Feldinan ( 195S). 
for example, animals were induced to jump when the* problem was 
un.solvable. Under these conditions, the animal lalls into the net about 
half of the time whatevt^r he chooses to do After a few trials th<» animal 
is reluctant to jump, and some form of punishment is n(‘ed(*d, such as 
shock on the platform to induce him to jump. Most animals will choose 
one side or the other and jump exclusively to that sid(‘. At this point, the 
problem is made solvable so that (*ither a light or a dark card is con- 
sistently correct. Most animals persist in jumping to one side throughout 
a large number of trials. Maier ( 194^/) calls this stereotyped, inappro- 
priate behavior fixated. He calls the process by which it became fixated 
one of frustration. If small runways are provided on half of tlie trials so 
that fixated animals can walk to the cards instead of jumping, three 
distinct groups develop. One group cjuickly Icjams to walk to the correct 
card and scxin learns to jump correctly. Another gioup remains fixatf*d 
on both the jumping response and the walking response. A third group 
responds correctly when the runways are in place but continues to jump 
to one side whether the correct or the incorrect card appears in that 
window. In this ease, it is clear that punishuicnt has led to behavior which 
is abnormally resistant to extinction. Furthermore, since one group TJf 
animals responds appropriately to the positive stimulus when the runway 
is in place but does not when there is no runway, punishment has 
produced opposite effects in the same animal in very similar situations. 

This small sample of research on the effects of punishment on 
behavior makes it clear that a major problem exists in specifying the 
conditions in which punishment eliminates a respoase as distinguished 
from those in which it leads to rejpetition or to abnormally inflexible 
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behavior. Although there is no simple answer, to this problem, some 
situations may be understood by determining what response is induced 
by the punishmej^t itself. Thus Fowler and Miller (1963} trained rats 
to run down a runway to obtain food. In addition they arranged the sit- 
uation so that some rats could be shocked on the forepaws and others 
shocked on the hindpaws during the performance of the running response. 
They found that shock on the forepaws produced slower running and 
shock on the hindpaws produced faster running. The greater the 
intensity of the .shock, the greater they found the two effects to be. This 
study makes clear that one of the effects of punishment is to produce a 
response which may be cither compatible or incompatible with the 
ongoing behavior. However, the variety of effects of punishment on 
behavior is sufficiently great that no simple solution to the punishment 
paradox emerges. 


PROBLEMS IN THE NECESSARY 
CONDITIONS FOR LEARNING 

While conditioning and instrumental-learning procedures will almost 
always produce learning, a number of (questions arise concerning the 
nec'cssity of fulfilling all of the conditions of these procedures. If some 
elements ot them are omitted, does learning still occur? In the condi- 
tioning paradigm in which three elements are specified, the CS followed 
by the US followed by the one can wonder whether the CS is 
necessary on every trial or whether association can occur between the 
two stimuli even though the response does not occur. Is the response 
necessary? In the selective learning paradigm, the three terms are the 
stimulus, the response, and the reinforcement. The question here is 
whether learning can occur in the absence of the response, the reinforce- 
ment, or both. Such questions are equally applicable to the extinction 
process. Let us turn to sensory preconditioning and some other ap- 
proaches* that attempt to answer questions pertaining to the necessary 
conditions for learning. 

SENSORY PRECONDITIONING 

If two neutral stimuli are paired together over a number of trials, 
and one of them is subsequently conditioned to a response, will the other 
elicit the response? The answer is that it will, and some form of asso- 
ciation occurs when the neutral stimuli are paired together. A basic 
study which defines sensory preconditioning was done by Brogden 
(1939). He presented a bu7.z and a light simultaneously 200 times to 
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eight dogs. He then conditioned a leg flexion response to one of the 
stimuli using shock as the OS. In 20 test trials he used the stimulus that 
had not been conditioned to the response but whi^b Jiad been paired 
with the CS in sensory preconditioning. He obtained CRs on about half 
of the trials of the 20-trial test session. Dogs which had not had the 
sensory preconditioning of the two stimuli averaged about a half of a 
response in the test period. 

While sensory preconditioning certainly d(K'S occur, and therefore 
it can be said that conditioning occurred in the, absence of tlu* occurrenc'c* 
of the response, other studies appear to show that associations formed 
in this way do not follow the simple principles of conditioning. Hofleld, 
Kendall, Thompson, and Brogden (1960) varied the number of pairings 
of the two neutral stimuli in the sensory-conditioning phase to determine 
if a normal learning curve would arise. The> used a six second lone and 
a two second light in preconditioning and gave 0, 1,2, 4, 8, 10, 20, 40, 80, 
200, 400, or 800 trials to 12 different groups of cats. (\)nditioiung 
consisted of two seconds of light followed by .1 second of shock. In the 
lest, the tone was presented alone. No simple learning eurv(‘ appeared. 
The curve rose quickly to a maximum with four pairings of the two 
neutral stimuli and then fell to a relatively steady level One could 
almost conclude that the sen.sory as.sociation gained its maximum strength 
in three or four pairings and did not increase significantly thereafter. 
This is in contrast to a normal acquisition curve in conditioning such as 
that in Figure 4.1. 

Sensory preconditioning does not seem to produce a typical gen- 
eralization gradient. Kendall and Thqmpson (1960) paired a tone of 
250 cps with one of 2,000 cps in 20 trials of preconditioning with cats. 
Then the 25()-cps tone was paired with shock. The cats were then tested 
for conditioning using tones of 500, 1,000, 2,0(X), 4,000 and 8,000 cps, thus 
one to fife octaves above the CS. They found about the same number of 
responses to tones from 500 to 4,000 cps, and virtually none at 8,000 cps. 
Tlius the generalization curve was essentially flat instead of sloping, and 
the authors conclude that sen.sory conditioning is essentially an all-or- 
nothing affair. 

CS-CS interval in sensory conditioning do(‘s not seem to produce 
the same results as CS-C/S interval in normal conditioning. In four quiif 
different studies by Silver and Meyer (1954), Coppock (1958), Hoffeld, 
Thompson, and Brogden (1958), and Wickens and Cross (1963) back- 
ward sensory conditioning proved to be nearly as effective as forward 
conditioning, and the best forward interval was not at about SOOir, 
as is true in classical conditioning. In fact, in the Hoffeld, Thompson, 
and Brogden study cited, it was generally true that the longer the interval 
the better the sensory conditioning in tests with intervals ranging from 
zero to four seconds. 
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OCCASIONAL OMISSION OF THE CS 

In a human eyelid conditioning study, Kimble, Mann, and Dufort 
(1955) omitted 6ne CS on the middle 20 trials of 60 trials of training, 
a form of pseudoconditioning procedure, and could find no difference 
between this group and om* for which the CS had been present on all 
60 trials. Dufort and Kimble (1958) could replicate the finding, but in 
studies l)y Goodrich, Ross, and Wagner ( 1957 ) and by McAllister and 
McAllister (1960), such omissions of the CS led to poorer performance. 
While the problem remains unresolved, it seems likely that the associative 
aspect ( H or habit ) and the motivational aspect ( D or drive ) may well 
develop somewhat differently in response to number or presentations. 

LATENT LEARNING AND LATENT EXTINCTION 

The name latent learning arose from an experimental problem in 
which an animal was permitted to explore an environment without 
receiving reward. He was subsequently provided with reward to deter- 
mine if there was “latent” or unexpressed learning that had occurred 
during the nonrewarded experience. In other forms of the latent-learning 
experiment, animals are permitted to experience the location a goal 
object, such as food, either when they are satiated for food and water 
or when they are strongly motivated and rewarded for another goal 
object. They demonstrate latent learning by later going to the food when 
they are hungry 

A fairly typical latent-learning study is that of Spenc'e, Bergmann, 
and Lippitt ( 1950 ) . They ran rats in a T maze while the rats were satiated 
for food and w'ater, food was in one arm and water was in the other. 
The animals were induced to run by permitting lh(‘m to return to the 
home 01 “social” cage aftc*r each run. After training, some of the animals 
were made hungry and some lliiisty, and then they were tested for latent 
learning. On the following day, the drive states of the tw^o groups were 
reversed and the animals tested again. On both days there was a ten- 
dency for the animals to choose the side appropriate to their drive state, 
indicating that they had learned something about the location of food 
and water during the period of nonreward. 

A great many studies of latent learning have been carried out (see 
Kimble, 1961), with some affirming the existence of latent learning and 
others failing to find such evidence. In spite of the frequent negative 
results, the large number of studies showing positive results appears 
to justify the conclusion that some latent learning does occur in the 
absence of relevant reinforcement. However, latent-learning procedures 
usually prove to be ineflBcient compared to those involving direct reward 
to a motivated organism. 
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In studies of latent extinction, the usual procedure is to train an 
animal to make a respons^^ ptTinit him to expcTience the fact that the 
reward has been removed without his making the response, and then 
demonstrate the effect of this experience on his pt'rf'irmant*e in extinc- 
tion. The classic study of latent extinction is that of Seward and l^vy 
(1949). They gave two groups of rats 10 training trials in which the rats 
ran over a narrow elcvatec* runway from one platfonn to receive food on 
a second platform. One group was given latent-extinction experience by 
being placed on the goal platform from which the food was n'lnoved 
both before and b(‘tween extinction trials. Tltc othiT group spent e(]uiv- 
alent periods on a neutral platform. The latent-extinction group showed 
significantly faster extinction than the control group. Thus it d(K\s not 
appear to be necessary for the response to be followed bv noiireward for 
some degree of extinction to take place. 

ACQUISITION AND EXTINCTION 
WITHOUT RESPONDING 

The occurrence of a response can be prevcaited by temporary 
surgical interv(‘ntion or through the us(‘ of drugs The CS and VS can 
then be paired any number of tunes while the animal is unable to 
perform the response*. After recove'ry from the surgery or after the drug 
effects have* worn off, the animal can be tested to see if h^ariiing oceurreil 
when it was unable to respond. Similarly, an animal may lx* conditioiK‘d 
and the response pi evented from (K curring during extinction while tlu* 
CS is pre.sented a number of times. After lecovery, performance shouUl 
reveal the effectiv'eness of such an < xlinction procedure. 

A typic.il study of learning without responding is that of lamer 
(1951). He inserted an elec trode into the* motor nervi* of a hind leg of 
a dog. The animal was then given a type of curare that acts to prevc*nt 
muscular response. While the dog was curan/ed, a tone was paired a 
number of times with shock to the leg that could ordinarily have pro- 
duced a flexion response. After the dog had recovered from the* effects 
of the curare, tests revealed that some learning had occurred during 
training even though the response had been prevented during the 
acquisition trials. Similar results were obt lined by Kellogg, Scott, Davis, 
and Wolf ( 1940 ) when th<» response was prevented during training-4)y 
crushing the motor roots of spinal nerves. When the* motor fibers had 
regenerated, the animals showed evidence of having been conditioned. 
Beck and Doty (1957) confirmed the.se findings while using sevc^ral 
surgical and drug procedures to prevent the response during training 
in cats'. 

Black (1958) used a curare to prevent the occurrence of an avoid- 
ance response in dogs during extinction His general procedure was to 
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train dogs to turn their heads and push a panel to avoid shock. When the 
response was well established, some animal v wtrc given 55 extinction 
trials while they were ciirarized and others were given the same number 
of trials while fr«s to make the response. In 400 additional extinction 
trials during which all animals were free to respond to the CS, the 
animals with the 55 extinction trials under curare showed evidence of 
much faster extinction than the control animals. A variety of interpre- 
tations o'l this result arc* possible. It could be argued that making the 
response itself has secondary reinforcing properties which ordinarily 
extend the extinction process by providing some reinforcement during 
extinction. The 55 trials during which the response was prevented could 
not provide this hypothesr/ed source* of reinforcement On the other 
hand, the avoidance* procedure is one in which successful avoidance 
prevents the occurreru'c of the (T’s In normal extinction, the animal 
has no means of discovering that the* shock is no longer present as long 
as lu* cohtiiuK's to make the responses The 55 trials under curare pro- 
vided this information to tlie expeiimental animals. 

These* studies seem to indicate th;it both e'onditieming and extinction 
can occur when the rcspeinse is physically prevented from oc'curring. 
Thus asseiciation between stimuli can occur on the basis of stimulus 
properties otlier than their capacity te) produce the revelant observable 
uncoiiditione^d response 

THE THEORETICAL ISSUES 

The basic theore*tical issues invedved in studies of the neces.sary 
C'onditions for learning are frequently reduced to two. One issue is 
whether association takes place between the stimuli involved, a position 
referred to as S-S learning, or whether learning always involves an 
association between stimuli and responses, S-R learning. The sec'ond 
issue is whether simple contiffuitij is a sufficient principle for learning or 
whether some form of reinforcement is necessary. Probably the most 
reasonable conclusion is that neither issue involves simple altciTatives : 
therefore, both S-S and S-R learning occur, as do learning through con- 
tiguity ai ^ learning through reinforcement. 


LEARNING TO LEARN 

Does the rate of learning improve as an organism learns successive 
problems? The answer to the general question is clearly affirmative, 
although there are a number of issues that arise in experiments designed 
to establish the truth of this proposition. 
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The simplest test of this question can hv made by training an 
animal to make a response •through reinforcement, instituting extinction 
when the learning criterion is met, reestablishing reinforc'emcnt when the 
response is extinguished, and repeating the proc'ess .• The question is 
whether acquisition and extinction occur more rapidly as the process 
C'ontinues. In this respect, the at'(]uisition phase appt*ars to be different 
from the extinction phase. »Lauer and Estes (1955) trained animals to 
jump from a platform to one of two windows to tibtain a food reward. 
The choice of either window was rewarded, and le.irning was measured 
in terms of speed of response (1/ latency). There wen* four periods of 
reinforcement separated by three periods ol nonreinlorcement. In suc- 
cessive reinforcement periods, tiie rale of learning was faster, and the 
speed of jumping at tlie end was faster. However, the results in extinction 
were differ(‘nt. During sucees.ave periods of noiireinforcement the 
animals slowed down a lesser amount and at a sIowct rate. Thus they 
extinguished mor(‘ slowly and show’ed less extinction. 

The difference in effect of repeated learning and repeated extinction 
on the acquisition phase as opposed to the extinction phase is even more 
pronounced in a study by Lauer and Lartcrette ( 1957). Two groups of 
animals had two trials a day for 54 days. One group was reinforced on 
all trials, while th#* other had nine days of reinforcement followt?d by 
six days of nonreinforcement, with four reinforcement period.s sepiirated 
by three nonreinforcement periods. Furthermore, since a straight runway 
w'as used in this study, two .scores, starting speed and running speed, 
were obtained. After the first accjuisition phase, the group with inter- 
spersed periods of extinction showed faster starting speeds and .somewhat 
faster running times during reinforcement than the group that was 
continuously reinforced. Thus, periods of extinction actually produced 
higher performance levels than were product^d by (ontinued reinforce- 
ment. Tie difference between acquisition and extinction was even more 
marked wjth respect to starting speed. During the first extinction period, 
the starting speed became slower than it was for the animals under 
continued reinforcement. During the .second extinction period there was 
little difference between the Iw^o groups, and during the third extinction 
period, the group for which reinforcement had been removed actually 
started faster th.'u the group under continued reinforcement. Thus, with 
repeated reinforcement period.s followed by periods of nonreinforcemcmt, 
animals seem to learn faster each time, but extingui.sh more slowly, 
if they can be said to extinguish at all. 

How^ever, it seems likely that if training were extended over a great 
many periods of acquisition and extinction, a stage would be reached 
in which both faster learning and faster extinction occurred. It will be 
recalled that a small amount of overlearning on a difficult problem pro- 
duced an increase in resistance to extinction in a study by Mackintosh 
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(1963), while large amounts of overlearning ^ produced progressively 
faster extinction in a simple problem, in a sludy by Ison (1962) shown 
in Figure 4.17. 

However, the effect of such alternation may be very different in 
avoidance learning, as shown in a study by Jacobs (1963). He trained 
rats to choose one of two compartments to avoid shock. He opened a 
starting-compartment door, and after 15 ^'seconds (7.5 seconds for 
another group) turned on u shock w'hich remained on until the animal 
reached the correct goal compartment. When this response was learned, 
shock was omitted and the response extinguished to a criterion of eight 
successive trials with a latency longer than 15 seconds. In this study, 
there were six periods of acquisition followed by six periods of extinction. 
Acquisition improved between the first and second acquisition periods, 
but showed no further demonstrable improvement. However, successive 
extinctions re(juired increasingly greater numbers of trials, as may be 
seen in Figure 4.20. Thus successive periods of acquisition and extinction 
appear to lead to faster learning and slower extinction. 

There have been a number of studies of repeated reversals in 
discrimination problems. In one of the classic studies. North (1950) 
trained animals to make one of two responses to achieve a food reward. 
Then the problem was reversed by rewarding the other response. In 
reversal learning, one can think of the reversal task as being one in which 
the first response must be unlearned for the second to be learned, thus 
combining both acquisition and extinction. North carried his animals 
through 12 successive reversals using a number of different procedures. 
'Hie results in Figure 4.21 show thaf the first reversal, produced more errors 
than original learning, but after the third reversal, the animals made fewer 
errors and the curve seems to be dropping throughout its course. While 
there have been a number of studies in which improvement has failed to 
appear with successive reversals, others such as those of Birch, Ison, and 
Sperling (1960), Mackintosh (1963), and Stietch, McGonigle, and 
Rodger ( 1963 ) found results very similar to those of North. Under 
most circumstances, then, one can expect at first poorer and then pro- 
gressively better performance with successive reversals. This means, of 
course, that after a few reversals, extinction of the old response cannot 
grow progressively longer as was found in the case of successive acquisi- 
sion and extinction periods. 

After the first reversal, the animal is relearning a problem that was 
learned before. One might ask whether the learning to learn shown in 
Figure 4.21 is a general improvement which would be demonstrated on 
all problems or whether the improvement is confined to the particular 
learning situation. Mackintosh (1962) made a comparison between 
reversal learning and learning a new problem. He also explored the 
effect of the amount of overlearning of the original problem on both 
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SUCCFSSIVt tXTiNCTION PERIODS 

Figure 4.20 

The progressive increase in the number of trials 
required to reach the criterion of extinction in suc- 
cesstve periods of extinction of an avoidance response 
following periods of acquisition. (Adapted from 
Jacobs, 1963. Copyright 1963 by The American Psy- 
chological Association and reproduced by permis- 
sion.) 

reversal learning and learning a new problem. To do this, he first trained 
24 rats to choose between a black stimulus and a white stimulus in a 
jumping-stand situation. The animals required about 75-76 trials. A third 
of the animals were given 75 extra overtraining trials, and another third 
were given 150 extra overtraining trials. Each of the three groups were 
then divided ii' half, with four animals trained to reverse the original 
black-white discrimination and the other four trained to choose between 
horizontal and vertical stripes. The data in Table 4.3 show that the effect 
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Figure 4.21 

Change in error score on trials two through six of 
original learning and successive reversals showing the 
improvement in learning score with repeated expe- 
rience. (Adapted from North, 1950. Copyright 1950 
by The American Psychological Association and re- 
produced by permission.) 

of overlearning was quite the opoosite between reversal and the new 
problem. As original training was extended Ixwond the criterion, reversal 


Table 4.3 


Effects of overlearning on subsequent reversal learning 
and learning a new discrimination problem (Mackintosh, 1963). 

TRIALS TO REACH CRITERION 


TRAIMNC CONDITIONS ON 

OBICaNAL PROBLEM 

REVERSAL 

LEARNING 

LEARNING 

NEW PROBLEM 

Trained to criterion 

124.75 

84.25 

Trained 75 trials beyond criterion 

89.50 

105.00 

Trained 150 trials beyond criterion 

78.25 

140.00 


learning became easier and easier, but the effect of overtraining was to 
make the learning of the new problem more and more di£Bcult. The 
terms usually used to refer to these effects arc positive and negative 
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transfer. Thus the effect of overlearning of the first problem was positive 
transfer, or a facilitative effect on the learning of the reverse of th<* 
original problem. However, overlearning had a negative transfer, or 
inhibiting effect, on the learning of a new one. 

That the negative-transfer effect is a temporary one in a longer 
series of problems is made clear in the work of Harlow ( 1949, 1959) on 
learning sets. He and MiA-garet (Kuenne) Harlow have demonstrated 
repeatedly that both monkeys and young children show progr(‘ssi\e 
improvement in learning new problems. For examplt‘, inonkt^ys can be* 
trained to choose one of two objects on th<‘ basis of anv oiu* of a nuinbt*r 
of differences between the two, such as shap* or color. I he first leaining 
period might reejuire ,'peveral hundred trials. Ho\\’evcT, if an animal learns 
many such problems, it will learn to learn, and lH‘eonic‘ ver\ efficient in 
solving new problc^rns in a very small numlxT of Inals r’n/ure 4.22 shows 



PROBLEMS 

Figure 4.22 

Discrimination and discrtmination-revrrsal curves 
based on the percent of correct second-trial responses 
showing the development of learning sets through a 
long series of problems. (Adapted jrom Harlow, 1949 
Copyright 1949 by The American Psychological 
Association and reproduced by permission.) 


the progress of a group of eight monkeys learning a series of discrimina- 
tion problems and subsequent progress of the same group in learning a 
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series of discrimination reversal problems. In both cases, the curves are 
approaching 100 percent. Since the first trial of a new problem or a 
reversal is an ‘‘information” trial, 100 percent performance on the second 
trial would represent maximum efficiency of learning in a single trial. 
It should be noted, however, that the demonstration of learning sets 
involves a skill within a narrow range of similar problems. The animals 
had reached the 90 percent point in solving discrimination problems but 
dropped to slightly over 60 percent in the first block of discrimination 
reversal problems. They again exceeded the 90 percent level only after 
solving more than 50 of the discrimination reversal problems. 

Acquiring a “set” to learn in a certain type of situation is not confined 
to higher organisms such as monkeys and children. Wright, Kay, and 
Sime ( 1963 ) have succeeded in demonstrating the development of learn- 
ing sets in rats. They trained 16 rats to discriminate shapes in a series 
of problems, lliis is a difficult discrimination problem for rats, but all 
learned .some of the problems, and three of the animals succeeded in 
.solving 32 problems over a seven month period. Animals that solved an 
appreciable number of the problems showed progressive improvement 
in learning efficiency. 

Let us summarize: 

f 

1. It appears that overlearning produces first an increase in resistance 
to extinction and then a decrease as training is extended. 

2. In avoidance learning, alternation of periods of reinforcement and 
nonreinforcement results in progressively faster acquisition and indefi- 
nite! v increasing resistance to extinction. 

3. Alternation of periods of reinforcement and nonreinforcement 
probably leads to faster learning in successive periods. Resist^ance to 
extinction, however, tends to increase in successive periods but may 
decrease if the number of extinction periods is great enough. 

4. Successive reversals of a two-choice problem produces first slower 
and then faster learning on each successive re\ersal, requiring that rate 
of extinction must increase and then decrease over a long series of 
rev'^rsals. 

5. Overlearning of one problem slows up learning of a new problem 
in a short series of problems, in a long series of similar problems, learn- 
ing sets develop which produce very rapid and highly efficient problem 
solutions. 

While learning sets appear to be partly specific to the type of 
problem in which they are developed, the progress from alternation of 
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periods of reinforccmeDt and extinction, through reversal learning of the 
same problem, through the'leaming to learn a class of problems, suggests 
that something more than the single measured response is learned in a 
period of training. However it is characterized or explained, some positive 
transfer appears to occur and something like learning to Imn is aerjuired, 



OPERANT CONDITIONING 


5 


Operant conditioning ic the name applied by Skinner (1938) to a 
procedure of exerting control over the behavior of an organism in a 
relatively free environment by means of the judicious application of 
reinforcement. In some of its applications, operant conditioning repre- 
sents a maximum of flexibility— in contrast to the less flexible behavior 
involved in classical conditioning and in instrumental learning. 

In classical conditioning, for example, one always starts with a 
response which is reliably elicited by a specific stimulus that is under 
the control of the experimenter. Thus by applving the stimulus, the 
experimenter can produce the response at will. The CS, the VS, and 
the responses are all under the immediate and precise control of the 
experimenter. The flexibility in the situation is confined almost exclusively 
to the transfer of associative connection from the I^S to the CS. Skinner 
(1938) referred to this kind of learning situation as respondent condi- 
tioning to distinguish both the procedure and the process from the more 
flexible operant conditioning. 

In instrumental learning, the behavior of the organism is almost 
always constrained in two ways. In- most cases, the animal or person is 
placed in a situation in which the freedom to respond is limited by the 
apparatus or the situation to a small number of clearly specified alterna- 
tives. Sometimes this restriction is .so severe that the organi.sm is free only 
to sit or run down a runway. Rarely are the alternatives more than two 
or three. The second important constraint is that the learning process is 
almost always studied in terms of “trials” arranged by the experimenter. 
That is, the animal is placed in a situation by the experimenter, is per- 
mitted to choose one of the alternatives, is reinforced or not depending 
on the choice, and terminates the trial by his response. The process 
is then repeated in a number of discrete trials. This constraining pro- 
cedure is referred to as being “experimenter-controlled.” 

Operant conditioning, on the other hand, is often referred to as 
being “subject-controlled.” Typically the organism is free to do what it 
likes and when it likes. The experimenter exerts control over the 
behavior through the application of reinforcement. Thus the behavior 
is emitted by the organism in operant conditioning, rather than elicited 
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through the applicatiop of an unconditioned stimulus as in respondent 
conditioning. Instead of “trials” in which the beginning is determined by 
the experimenter and the ending determined by the oj^ganism, in operant 
conditioning the organism determines the time of both the beginning 
and the end of the trial. 

THE EXPERIMENTAL ANALYSIS OF BEHAVIOR * 

Operant conditioning is defined in terms* of procedure. Tlie program 
of research which grew out of it, however, has a number of special 
characteristics which^ do not necessarily follow from the procedure. 
Rather, they are important positions initiated by Skinner ( 1938, 1953, 
1951, 1961) and developed, in part, by numerous others on fundamental 
issues relevant to the field of learning and learning theory. The whole 
complex of specialized terminology, style of exjx^rimentation, and attitudes 
toward theoretical and experimental issues, as well as directions of major 
research, has come to be referred to as aspects of the cxperiuwntal 
analysis oj behavior. A few of these charactemstics and problems need 
further discussion. 

REINFORCEMENT 

Skinner does not ask why a stimulus is reinforcing, he only seeks to 
determine that a stimulus has reinforcing properties. He feels that such 
properties can Ik? shown by a simple test: Choose some discrete aspcc*t 
of the behavior emitted by an organism-one that occurs often enough 
to be counted. Count the frequency over a fixed period of time so that 
you determine the rate of emission of that bit of behavior, (That rate 
of emission is called the operant level.) Then, for a pericKl, follow each 
install®^ of an emission of that bit of behavior by a stimulus. If ihe rate 
of emission increases then you have established the reinforcing prop- 
erties of the stimulus. Skinner argues that this defining procedure is not 
circular because once it is determined that a stimulus does have rein- 
forcing properties, it can then be used as a reinforcer in many situations. 
Thus, the “transsituational” property of reinforcers is .said to establish 
their generalit). 

In Skinners terms, reinforcing stimuli may be either positive or 
negative. A positive reinforcing stimulus increases the rate of operant 
responding when it is applied immediately after each response. A negative 
reinforcer increases the probability of a response when it is removed 
immediately after each response. Punishment is defined as the removal 
of a positive reinforcer or the application of a negative reinforcing 
stimulus. 
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THEORY 

Skinners refusal to ask why a stimulus has reinforcing properties 
is part of his general resistance to the formulation of theoretical propo- 
sitions. He takes the position that when we know enough to exercise 
control over behavior, we shall have no need for theory. Skinner’s 
theorizing about behavior thus remains consistently informal, inductive, 
and implicit rather than formal, deductive, and explicit. He prefers to 
develop broad empirical generalizations rather than specific constructs. 
In the terms of theory structure as diagramed elsewhere in this volume 
and in Psychology as a Natural and Social Science (Walker, 1967), 
Skinner’s theory has no intervening variables, and an attempt to diagram 
it would produce an empty box. 

EXPERIMENTAL STYLE 

In Basic Statistics, Hays (1967) explains several ways to determine 
the statistical significance of an experimental finding. Most procedures 
involve dividing the difference between the mean scores of two groups 
by an estimate of the variability in score within the groups. If the result- 
ing ratio is as large as a predetermined standard, the result is said to be 
statistically significant. There are generally two ways to increase the 
size of the ratio. Since the estimate of variability is determined by N 
(the nirnber of organisms) and by the variability of the behavior in 
question, one can choose to increase the size of N or to decrease the 
variability. Increasing the size of N is straightforward and laborious. 
Producing a decrease in variability means increasing the quality of the 
experimenti‘r’s control so that factors other than the one under study 
cannot produce unsystematic variation in the scores. Skinner clearly 
chooses the latter procedure and undertakes to exert sufficient control 
over the behavior in question to make a statistical test unnecessary. Thus 
he chooses to exercise experimental rather than statistical conti>')l over 
his subjects. It should be obvious that this choice leads to an ultimate 
style of research in which a single organism is a sufficient “group” for 
the establishment of a principle. Additional organisms are then tested 
only to determine that the principle works with all or most individuals. 

CUMULATIVE CUR^'ES 

^T)ata about an individual performance can be recorded by a cumu- 
lative recorder, which consists of a pen that draws a line on a roll of 
paper moving at a constant rate. Each time the animal makes a response, 
the pen moves a fixed, small distance across the roll of paper. If the 
animal responds rapidly, the pen will describe a steep slope across the 
paper. If the animal responds slowly, the slope will be less steep. If the 
animal is irregular in his rate of responding, the curve will have a 
variable slope. 
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Figure 5.1 is a curve that might have resulted from the following 
simple demonstration. Suppose we place a hungry animal in a Skinner 
box and leave him there for a while. A Skinner box is a devia^ that is 
usually barren except for a small lever and food tray. It is likely that 



Figure 5.1 

Drdu'iug rcprcsi uttn^ a cumulatu r record of th( 
acquisition of an operant response (In ttns aspect, 
the paper moved from right to left, on a cumulatu ( 
recorder, the paper would have unrolled upward ) 

sooner or later he will accidentallv press the lever while exploring th(‘ 
box, and a pellet of food will be delivered to the food tray Pressing the 
lever will cause the pen to trace a notch across the moving roll of paper. 
The pen will then draw a straight line until the levtT is pressed again. 
If the^ animal presses steadily, the pen will trace a series of notch(*s 
across the moving paper. [Althougli Figure 5.1 happens to be an artist’s 
drawing, one of the virtues of cumulative records is that they can 
be photographed directly, for there is no need to "process” such data.] 
The curve in Figure 5.1 rises almost abruptly from the baseline, and 
its characteristic slope is estjiblished almost immediately. It is obvious 
that once the animal pres.sed the lever, 'ittle time was spent in learning 
to associate the pressing of the lever with the delivery of food. Thus, 
there was very little “learning” in the sense of learning how to do 
something. Had there been appreciable learning in this sense, the curve 
would have risen slowly, with an increasing slope, until full learning was 
complete, after which the slope would have remained constant. 

The interpretation of Figure 5.1 points up an essential difference 
between operant conditioning and other forms of “learning.” If one 
makes a distinction between learning and performance, operant condi- 
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tioning is largely concerned with factors which influence the performance 
level rather than factors which influence learning. Learning is reflected 
in the rate of approach to the asymptote of a learning curve. In terms of 
a cumulative rec6rd, operant conditioning is largely concerned with 
factors which will influence the asymptote of a learning curve— not the 
rate of approach to it. 

To continue our demonstration, if we stop delivering food when 
the aninfa! presses the lever, we shall have established experimental 
extinction conditions. Figure 5.2 is the kind of record we might expect. 
Tlie slope of the curve indicates rapid responding at the outset of 
experimental extinction. This burst is followed bv a pause. As extinction 
continues, the bursts of responding become shorter and pauses longer 
until the curve becomes essentially horizontal, indicating no further 
responses. 



Figure 5.2 

Drawing representing a cumulative record of the 
experimental extinction of an operant response. 

Figure 5.2 illustrates another characteristic of operant conditioning. 
Instead of placing exclusive emphasis on extracting data and calculating 
various values, one can "interpret the grain” of a cumulative record. The 
bursts and pauses of Figure 5.2, the scallops of Figure 5.3, and the bursts 
and pauses in Figure 5.8 are characteristic of the “grain of the record.” 

explanations have been offered to explain scallops, bursts, 


SCHEDULES OF REINFORCEMENT 

The capacity to delay gratification is regarded as a sign of maturity 
in humans. A similar capacity is often seen in animals— a dog will wait a 
long while and will do a great many things for an occasional pat on the 


A variety c 
anef* pauses 
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head. A major experimental innovation of Skinners (1938) is the delib- 
erate omission of reinforcevnent after some of the responses emitted by 
an organism. This procedure, in contrast to continuous reinforcement, 
is referred to as intermittent or partial reinforcement.* Partial reinforce- 
ment has been applied in a great many patterns, called schedules of 
reinforcement. 

RATIO REINFORCEMENT ' 

If reinforcement is offered only after an animal has made a fixed 
numl^er of responses, say five, then he is ort a fixed-ratio schedule. In 
training an animal or human to perforin on a fixed-ratio schedule, espe- 
cially if a small ratio js used, one must generally start with continuous 
reinforcement until the organism is responding well. Then intermittent 
reinforcement can be instituted and only gradually can small ratios be 
imposed. Three curves that are typical of performances undi'r fixeil-ratio 
schedules nay be seen in Figure 5.3. The curves represent the behavior 



Figure 5.3 

Performance curves of three animals under fixed ratio 
reinforcement schedules. Heinfot cements arc marked 
by horizontal lines. The ratios are indicated hy the 
numbers. (Figure 91 from The Behavior of Organisms 
hy B. F. Skinner. Copyright 1938 hy Appleton- 
Century-Crofts, Inc. Reprinted hy permission of 
Appleton-Century -Crofts.) 


of animals that were induced to work on fixed-ratio schedules of 48:1, 
96:1, or 192:1. After each reinforcement (indicated by a horizontal line), 



100 


CONDITIONING AND INSTRUMENTAL LEARNING 


the animals paused, then responded at a positively accelerating rate— 
the most rapid rate of responding was just before the next reinforcement. 
Ratios much liigher than those shown in Figure 5.3 have been found 
effective. While th^re are limits to this process, it is astonishing how hard 
an organism can be induced to work to receive a single reinforcement. 

Ratio schedules can be made variable rather than fixed. If an 
organism is reinforced once in five responses on the average, he is said 
to be on" a variable-ratio schedule. Performances of animals on variable- 
ratio schedules tend to be somewhat higher in rate than performances 
obtained on equivalent fixfid-ratio schedules. Their performance curves 
fail to show such pronounced "scallops” as those evident in the grain 
of the curve's in Figure 5.3. Since an animal doi's not know precisely 
when to expect reinforcement, his tendency to speed up in anticipation 
is not so pronounced. 

INTERVAL REINFORCEMENT 

Another type of schedule involves reinforcement at a fixed interval. 
In this tvp(' of schedule', reinforcement is delivered after the first response 
following the expiration of a fixed time interval. It is necessary to make 
reinforc(*ment continge'nt on a response, even though it is time that 
d(‘t(‘rmines the appearance' of the reward. 

Figure 5.4 is a tracing of a record of an animal on a variable-interval 



Figure 5.4 

Performance of an animal under a one minute 
i ariable -interval schedule of reinforcement by water 
after water deprivation Each mark on the curve 
represents a r( in for cement. (Redrawn from Figure 11 
of The Experimental Analysis of Behavior by 
Thom Verhnve Copyright © 1966 by Meredith 
Publishing Company Reprinted by permission of 
Appleton-Centwy-Crofts ) 


schedule. I’his record, taken from Verhave ( 1966 ) reprcs''nts the per- 
formance of a rat pressing a lever to receive water on a variable- interval 
schedule of one minute. The animal had been without water for 22 



OPERANT CONDITIONING 


101 


hours. His rate of responding, as revealed by the slope of the curve, 
remained fairly constant oyer a considerable ptTiod and then gradually 
became negatively accelerated as the animal approached satiation tor 
water. The curve in Figure 5.4 is typical of interval-schedule performances. 

Other Schedules. While the simple ratio and interval schedules are 
the most commonly employed, a large number of others have been used. 
They have such names as tandem, mixed, interlocking, alternative, cHin- 
current, conjunctive, interpolated, adjusting, and differential-rale sched- 
ules. As the names imply, some are complicated, and some involve 
simultaneous use of more than one reinforcement pattern. Po.ssibly, the 
most productive schedule is the one that rewards only rapid responding. 
For descriptions of these and other schedules of reinforcement, the 
reader .should consult Ferster and Skinner (1957) or Verhave (1966). 

RESISTANCE TO EXTINCTION 

One of tile more interesting findings that has i‘ni<Tgt‘d from the 
study of operant conditioning is the high resistance to (‘Xlinction whi(‘h 
follows removal of all reward after intermittent reinlorceinent An 
organism will respond far longer in the absence' of reward after inter- 
mittent reinforct'ment than after continuous reinforf*t‘m(*nt. 

Figure 5 5 is an illustration of the higher rc'sislanci* to extinction 
with int(‘rinittent reinforcenn*nt. In this ca.se, the measure ol pi rformance 
is running speed in a short runway. These results are from a study by 
Weinstock (1954). He trained four groups of 12 rats each to run down a 
short runway to food while hungry. They had one trial a day for 70 days 
during which the four groups were reinforced on 30 percent .50 percent, 
80 percent, or 100 percent of the trials. Then, during .i p(*riod of (*xtiiic- 
lion, none of the rats was reinforced for 20 days. Weinstock found no 
differeiwc in running speed between the four groups during training, 
but there, are fairly clear differences during extinction as illustrated in 
Figun' 5.5. Although curves do not maintain a simple order throughout 
the trials, there is a clear tendency for intermittently reinforcc'd animals 
to run faster than continuously reinforced animals. Furthermore, in terms 
of persistence, there is a clear tendency for the groups to differ during 
the last 12 days in such a way that the smaller their percentage of ndn- 
forcement was during the training, the faster they c‘ontinu(*d to run 
during the latter part of the period of extinction. 

The same phenomenon has been demonstrated in the slot-machine- 
playing behavior of human subjects in three studies by Ix'w is and Duncan 
(1956, 1957, 1958). The three studies were .sufficiently similar that the 
results from all three have been plotted in Figure 5.6. In each case, 
subjects were asked to play a slot-machine-like device and given an 
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Figure 5.5 

Experimrntal extinction of running speed in rats in 
runwatj reiiulUng from different percentages of partial 
reinforcement during training. (Adapted from Wein- 
stock, 1954 Copyright 1954 hy The American Psy- 
chological Association and reproduced hy permis- 
sion.) 


unlimited (for «ill practical purposes) supply of disks to be used in 
playing They were told that they could play as long as they liked and 
that they would receive a dime fc r each disk they received as a payoff 
from the machine. In all ca.ses, there was *i brief period during which 
the device paid off by some percentage of reinforcements for each group, 
ranging from zero percent to 100 percent. Then the device ceased to pay. 
The relevant measure of performance was the number of times the 
subjects then pulled the lever without further reinforcement. Because 
there was great variability in the scores achieved by the different subjects 
within each group, the scores were converted to logarithms and the 
means of the logarithmic values were plotted. Figure 5.6 shows clearly 
that the smaller the percentage of intermittent reinforcement during the 
training period, the longer the subjects persisted in pulling the handle 
after no further disks were forthcoming from the machine. 

It should be noted that none of these studies comparing the effect of 
different percentage of reinforcement on number of responses to extinc- 
tion involved the use of cumulative recorders and the presentation of 
unprocessed records. In the Lewis and Duncan studies and the study by 
Weinstock, traditional methods were used to record the data, and the 
results reported are in terms of group performances. 




OPERANT CONDITIONING 


103 



Figure 5.6 

Number of respousvs r/tirin^ experimental rx/UK /ion 
of a lever-pulling r(spotUie in human subjects reunit- 
ing from clifferences tn percentages of reinforcenu nf 
during training (Adapted from Lewis and Duncan, 
J956, Lewis and Duncan, 1957, and Leuu> and 
Duncan, 1958. Copyright 1956, 1957, 1958 by The 
American Psychological As.sociation and reproduced 
by permission ) 


The studies discussed are a sample oi a very large group of studies 
of the gffects of intermittent reinforcement on re.sistance to extinct ion. 
An old but comprehensive review of partial reinforcement, including 
efforts to give explanations of the effect, is contained in Jeiikin.-N and 
Stanley (1950). Several textbooks and monographs also contain summary 
di.scussions. See Ferster and Skinner (1957), Keller and Schoenfeld 
(1950), Kimble (1961), and Skinner (1938, 1953, and 1961). 


DISCRIMINATION 

Discrimination learning in operant conditioning follows a pattern 
similar to discrimination learning in other forms of training. The basic 
procedure is to present one stimulus in the presence of which the operant 
response will be reinforced and another stimulus in the presence of 
which the response will not be reinforced. Evidence of discrimination 
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will be the performance of the response in the presence of one stimulus 
and the absence of the response in the presence of the other. 

A typical study of a discriminated operant response is one by Lane 
(1960). He first ftad to determine whether he could gain operant control 
over the chirping behavior of a bantam chicken. Starting when the chick 
was five weeks old, the animal was placed in a box that was fitted with a 
microphone. The microphone was wired to a voice relay, a devic'e which 
can be activated by sound of critical intensity, the sensitivity of the 
voice relay was set so U lesponded to about 95 percent of the chirping 
sounds emitted by the chick. The operant rate of chirping reported by 
Lane was about 16 chirps a minute before operant control was estab- 
lished When the chicken was hungry, the delivery of its food was 
contingent on a chirp loud enough to activate the food delivery mech- 



TIME 

Figure 5.7 

Cumulatnc respotur curie for the chirp response 
of otic chicken undci a fixed-ratio schedule of letn- 
forcemeut The chick was reinforced foi cxerij 
twentuth chirp, rdnforcements arc represented by 
the marks on the risinf^ line. The period represented 
is slifi^htly more than 16.5 minutes, ivith approxi- 
mately 2,800 responses during the period. There are 
approximately 140 reinfoi cements on the record with 
animals receiving a reinforcement about once every 
seven seconds. (From Lane, 1960. Copyright 1960 
by The American Association for the Advancement of 
Science. Reprinted by permission of the author and 
publisher ) 
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anism. When the mechatiism was activated, a food tray was made avail 
able to the chicken for a period of four seconds. 

Operant control vvas established, and a fixed-riWIio sch(‘dule vv«is 
instituted so that the food tray was presented for a tour s(*cond period 
only after every twentieth chirp. Under these conditions, the rate of 
chirping increased sharply. Figure 5.7 is a drawing reprt‘senting the 
cumulative record of the chirping behavior of a chicki'u alter chirping 
in the presence of the fixed ratio ol reinforcement was wi*ll established 
The rate of responding in Figure 5.7 is fairly high anrl unilorin, with 
reinforcements being achieved about every seviMi seconds 

The 0 [>erant level of chirping was about 21 chirps a minute belore 
control was established* and the level was about 27 .i minute when food 
was continuously present but not (*ontingent upon chir|nng Under a fixed 
ratio of reinforcement, the rate was approximatclv ll.") .i mmutt‘. some 
wlial slower on the av(‘rage than the rate depK'*(‘d in l•’lguIe 5,7 

To establish that the* high rate of chirping for food rcinforcc’meiit 
n'presented real operant (‘ontrol, I,.ane established two control groups. 
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Figure 5.8 

Cumulative response curve for th.^ chirp response of 
one chicken under a discriminative procedure. The 
curve represents approximately 450 responses made 
over a period of approximately 16.5 minutes. The ris- 
ing portions are responses made in the presence of 
the positive, red light; reinforcement is represented 
by the four abrupt rises and drops in the curve The 
flat portions •naicate the low-response rate in the 
presence of the green, negative stimulus. (From Lane, 
1960. Copyright 1960 by the American Association 
for the Advancement of Science Reprinted by per- 
mission of the author and publisher.) 
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To one group, he delivered food at the same rate the animal had 
achieved under ratio reinforcement but made delivery of food inde- 
pendent of the chirping hehavioi. The rate of chirping was only nine a 
minute in this group. For another group, he made the appearance of the 
empty tray contingent upon chirping Their rate was only eight a minute. 
Thus, simple stimulus change as represented by the appearance of the 
tray wars not reinforcing. 

In order to establish discrimination, Lane put the animal on a fixed- 
interval schedule while a ^red light wa.s on. Tlie animal was reinforced 
for the first chirp after two minutes had elapsed. Following one rein- 
forcement on this fixed-interval schedule, a green light came on, and 
reinforcement became contingent on two minufes of silence. The red 
light and the green light were alternated. Figure 5.8 is drawn from a 
record made after discrimination was w'cll established. It show^s a pause 
after each n*inforcement, then a positively accelerating rate of response, 
then the delivery of leinforcement, flat portions of the line indicate a 
lack of responses while thi^ grcon light w^as on. 

Th(‘ animals learned this discrimination fairly woll in about one hour 
During the fourth hour ol discrimination training, the average response 
rate in the presence of the positive rc*d stimulus w'as about 22 a minute, 
and ip the pre.sence of the negative green stimulus about three a minute. 

SHAPING 

An important new- priricip^ of learning arises in the context of 
operant conditioning. That principle is the shapmg of a response. In 
classic'al conditioning th^^ ehc'its a given respon.se that is then asso- 
ciated with the CS. The CR is essentially the same as the In instru- 
mental learning, a limited number of discrete and completely idif ntifiable 
responses can occur m the experimental situation because of the char- 
actcT of the apparatus. Operant conditioning offers the possibility of the 
development of an entirely ncnv rc\sponse in the situation— a response that 
would not occur naturally. Shaping is a highly descriptive name for the 
process. Starting with the responses the organism can and will make in 
a free environment, its behavior can be shaped into almost any pattern 
that is within the scope of the experimenter’s or trainer’s ingenuity. 

EXTERNAL SHAPING 

There are two w^ays to shape a response. The first way might be 
referred to as external shaping. If one wishes an organi.^m to make a 
particular response— pressing a lever to obtain food, for example— the 
environment can be arranged to make this response more likely. If the 
space in which the animal is placed is large, he is likely to spend much 
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more time exploring tl^an if it is small. If the environment is complex, 
he will spend more time exploring than if it is simple. Therefore, choosing 
an enclosure that is small and simple should make^the level-pressing 
response likely to occur sooner than choosing a spac'e that is large and 
complex. If the lever is of such a size, shapi\ and in such a location that 
the animal is likely to press it accidentally, wc can sav that the response 
has been shaped by external means. In the language of Skinner, classi- 
cally conditioned responses are rigidly shaped, while instrunu'utal 
responses are somewhat less rigidly shaped but are still highly controlletl 
externally. 

INTERNAL SHAPINf 

The second kind of shaping— in/ermi/ s/wping— can occur in a very 
free, highly unstructured environment. It can be referred to as internal 
shaping because the constant strain imposed upon tlie behavior of the 
organism eventually lies within the organism rather than in the pliysical 
environment. Skinner ( 1951 ) describes the procc'ss in detail. As is 
evident in Skinner’s acctiunt, the process of internal shaping can be 
described with considerable objectivity, but the execution of the process 
requires intelligence, ingenuity, and great skill on th(‘ part of the person 
doing the shaping. 

Suppose we train a dog to dance on command. The* firtt step 
described by Skinner ( 1951 ) is to establish .some stimulus as a s(*condary 
reinforcer under the immediate^ and preci.se control of the exp(*riment(‘i. 
There are at least two reasons for this step. Shaping procc'cnls most rapidly 
and effectively when a reinforcement precisely coincidc's with the 
response. Most primary reinforcers, such a.s food for a hungry animal, 
require some time for delivery and consumption. Furthermon\ a sc*con- 
dary reinforc*er can be used many times without producing satiation, as 
would •food, for instance. Thus, training can be extended much longer 
with a secondary reinforcer than with a food reward. 

Any neutral stimulus could be used as a seconrlary reinforcing 
stimulus, but Skinner advises the u.se of a sound that does not require a 
preparatory move. Visual stimuli might not be seen liy the dog, and a 
whistle requires an intake of breath that creat<‘.s a delay in the sequence 
of response ai.d reinforcement. Skinner, in his demonstration, used a 
“cricket” of the kind that children used to be able to buy or which \ised 
to come, occasionally, in a box of Cracker Jack. If no cricket is available, 
snapping ones fingers will do as well. The trainer should be sure the 
animal is quite hungry and should have an attractive food in pieces that 
are large enough for the animal to appreciate, but small enough so that 
a number can be used without satiating the animal. The fingers must be 
snapped as a piece of food is tossed to the animal; the sound should 
come immediately before the delivery of the food. In order to establish 
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the snap of the fingers as a secondary reinforcer, some shaping of the 
behavior may he required. If the dog “begs” or tries to jump up on and 
paw the trainer, nc reinforcement should be given. The snap and the food 
should be given only when the dog goes to the place where the trainer 
has decided that reinforcement will occur. Shortly, often within a 
minute, the Irainer will be able to omit the food occasionally. The snap 
will liave 1 eeorne a suffic iently strong reinforc'er that it can be u.sed alone. 
(During furtlicr training, it may occasionally be necessary to pair the 
snap with tood to prt‘\ cut. complete f‘xperimental extinction of its sec- 
ondary reinforcing property, but most of the training can be pursued with 
ihe snap as the only reward.) 

One will then be able* to begin shaping the* dog’s behavior into a 
pattern that resi*mbles dancing. The method is often referred to as that 
of stirrrssivr (ipproxwuition Suppose we decide that one dance step shall 
consist Af shifting the dog’s weight back and forth from one forepaw to 
th(* othei. By watching, the dog’s feet, w'(‘ can snap our fingers whenever 
vve sec th(' dog lift a paw. As soon as paw lifting occurs promptly, we 
can cliange to leinforcing only alternation between paws. Once that is 
estabhdied. W(' might choose to uwvard only a double shift. Finally we 
might n ’liforce only rhythmic changes in weight. We wall have* shaped 
a rc'spo.'. i th.it would not normally oc'ciir. Suppose wo decide to train 
thi' ilog to shift weight three times and then wdiirl around once. This 
I'oidd \h' ac'coinph.shed bv snapping our finger.> onlv after the third 
weight sinlf until this dance step is occurring in triplets. Then wc could 
bi'gin to lemforcf' onlv those triplets that are followed by a tendenev 
to turn to one side. Graduallv we swiuld be able to shape the turning into 
a full circle. We then would h*ive chained the w'eight-shifting response 
to thi' turning i espouse Hy clever use of the method of successive 
approximation, and by cbaining one response pattern to another, complex 
and highly unuMial response patterns can be shaped. Limits on tfi’e com- 
plexity of the bcdhivior that can be .shaped are much more likely to be 
set by the limits and skill of the trainer than by the limited capacities for 
learning on the part of the animal being trained. 

“SUPERSTITION’* 

/To say that reinforcement is contingent upon a given pattern of 
behavior is only to say that the behavior was follow'ed by reinforcement. 
The contingency may involve a casual relation betw^een the response and 
the reinforcement. A steer put to graze in an old orchard may learn to 
shake the trees to obtain apples after an accidental bump or two against 
the trunk is followed by a shower of apples. In such a case, the behavior 
appears rational or even insightful. Reinforcement contingencies may be 
mediated by another organism, as is the case in which a trainer shaped 
the behavior. In that case the behavior may appear irrational and “super- 
stitious” as long as we are not aware of the role of the trainer. 



OPERANT CONDITIONING 


109 


It is possible for romplex patterns of behavior to be shaped and 
chained by an accidental contingency betwe(‘n the behavior and the 
reinforcement. Skinner (1948) has reported one demonstration of this 
kind of accidental chaining of complex behavior. He placed hungry 
pigeons in a box supplied with a food hopper. In ont* in.stanc(‘, the hopper 
w^as wired to a clock which made the hopper avaiialile to th(‘ pigi'on tor a 
five second period every 15 Neconds. The appeal ancr of the hopjXT was in 
no way contingent upon what the pigeon did. Yet ^^'hen pigeons \\t‘ie 
placed in this circumstance for a few mimi4‘s each dav, somt' of them 
developed complex .md often bi/aire patterns of behavioi. I'or example, 
one bird w’as report(‘d to tliriist its head ri'pctiti'dlv low.iid one upper 
corner of the cage beiV't*en lemforcements Anotlua made ('ounlerclock- 
wise turns around the cage, making tv\o oi ihna* turns bi^twecm uan- 
foi cements. To .m obserNCi, the (‘orrelation lu'tween th(‘ p.uticiilai pattian 
of behavior and th(' leinlorceinent wms not olivions The bird appearial to 
be p(‘rtorming complex patterns of bediavioi to acini' v(‘ food (‘\(*n though 
neither the physical environnn'nt nor a traiiua* had established a causal 
link betwciMi the two. 

Anyone with a pet animal is likely to liecome aware of similar 
accidental correlations Fiir (‘xample, most dogs jiiefiT table siTaps to 
icgular dog food. Oik' dog that is perinitti'd faiiK fi(‘(‘ opportunities to 
leave and reentei tlu’ house once found table scraps iii lu'r dish wften sIk' 
rt entered the house. Now, during dish -washing time*, if table .sciaps are* 
not forthcoming, she asks (ei be let out, immecliate'ly asks to be re- 
admitted, and preimptly leieiks in her foeid di.sh feir scraps. 

These illustiations make elesu ihat an ace'idental c'ontinge'ncy of 
reinforcement couple'd wath the* capaeitv of organisms to le*arn and te) 
(‘Oiitinue to perfoiin behavior under iiite*rmiftenl r(‘inf()ie‘e‘irie*nt e*ari 
ceimbiiie to produee “superstitienis” behavieir It is no? eliffieult to find 
such ffatterns in our pet.s, our friemds, and oursedves. 
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Many patterns of social behavior may be learned through two closely 
related processes called imitation and modeling} 

IMITATION 

IinKation is not necessarily involved when the behavior of two 
organisms happens to be the same— the two may have learned the same 
behavior independently in the presence of the same cues. For this reason, 
one must know the conditions in which behavior was leamed in order to 
know that it is truly imitative. 

In order to study imitation, experimenters often construct a situation 
in which an organism is rewarded only when it imitates the behavior of 
anothd organism. Once the tendency to imitate is clearly established the 
subject can be tested to see if it will continue to imitate other organisms 
in situations far different from the one in which the original training was 
carried out. Those who study imitative behavior affirm that it is often 
a generalized response and that, o^ce it is acquired, it will occur in a 
variety of situations. 

The generality of imitation sets it apart from forms of learning that 
have been discussed previously. Even the most general of these, the 
acquisition of a learning “set,” is conceived to be relevant only to a 
particular type of problem. 

Miller and Dollard (1941), in their classic analysis of imitation, 
identify two ways in which behavior is learned through imitation: 
copying and matched-dependent behavior. In copying, the learner grad- 
ually brings his own performance into close approximation to that of a 
model. An example is instruction in handwriting as it was once given in 
the lower grades of school. The teacher provided the pupil with model 
handwriting, either in a book or on the blackboard, and the student 
copied the script over and over in an effort to reproduce it. In copying, 
it is possible for the subject to tell when he has produced an acceptable 
copy of a performance. By contrast, the subject is completely dependent 


'Social learning is treated extensively in Zajonc, Social Psychology. An Experimental 
Approach, 1966, in this series. 
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Upon the acts of the leader^for cues as to what is appropriate in inatched- 
dependeiit behavior Miller and Dollard undertook to demonstrate this 
kind of learning in experiments with both rats and ckildren. 

The experimenters specified four necessary conditions for learning. 
According to them, there must be a drive sshich impels the organism to 
action. (According to Miller and Dollard, a drive is a strong stimulus, 
and any stimulus may become a dri\e, if it is strong enough.) There 
must ])e cues which di'termine when the organism will respond, wdiere 
it w'ill respond, and which response it will nvike. They also believed tluit 
the organism brings to the situation a hierarchy ot responses, at least one 
of which must lead to reward, li the organism is to learn, reicard, which 
Miller and Dollard (‘cpiate with rt*infoic(Mnent, is defined as that stimulus 
or condition which reduces the mteusitv ot the dnv(‘ stimulus. Previously 
neutral stimuli may become acapiircd or secondaiv rewards through 
association, but a stimulus (*annot lu' laantoicing exc(‘pt in tlu‘ presence 
of the appropriate drive. 

According to Miller and Dollard, the essential ditf(‘rence betwee^n 
matched-dependent learning and ordmary h'arning lies in the nature of 
the cu('. This ditference can be made clc'ar in a simple i‘xperiment. 
Suppo.se a hungry lat, w hich \vv shall de>ignatt‘ as the leader, is trained 
to choose one of two alternative paths to tood on the* basis ot the location 
of a black, as opposed to a white, f‘ard. Tru^ drive is hunger; tlfl' cue is 
the black card, the appropriate response* is turning in the direction 
indicated by the placement of the card, and the reward is food. Suppose 
we then train a foUoicet to make either the* same choice* as the lead(‘r or 
the opposite choice. The drive, the* #<‘sponse, and the* reward w'ill be the 
same, but the cue w'ill be dillerent. To attain the food reward, ttie fol- 
lower animal must utilize the behavior of the lead(‘r rat as the cue for 
eithcT imitation or “nonirnitation,” (Both imitation and nonimitation are 
forms of matched-dependent behavior.) This design was followed in the 
Miller and Dollard experiments, w'here the apparatus was an elevated 
maze. The maze had a short starting sl(*m, and a gap separated the stem 
from two short runways, one leading right, the other left. At the ends of 
the two arms there were clips to hold the black and white c ue cards 
and a sunken cup for the food reward the leader animals. To reward 
imitative behavior, the experimcntcT could open small lids in the rynway 
and expose food for the follower animals. 

To estalilish that imitation was a generalized habit, it was necessary 
to train th^ee different groups of leader rats. Oie group of eight albino 
rats were trained to discriminate betwc*en black and white cards. For 
four of the animals, the black card was positive and the white card was 
negative, and for the other four, the cues were reversed. The albinos 
were trained to discriminate between the black and white cards while the 
cards were shifted back and forth unpredictably so that the only cue 
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available to th(‘ animal was the presence of the positive card on one of 
the arms. The rats usually re(|uired a large number ol trials l>efore they 
learned to discrimiriate essentially without error. Once these leader rats 
had learned to perform the task successfully alone, they were given some 
additional training with a following rat placed behind them on the 
starting stem at the beginning ol each trial This additional training was 
necessary *o prevent the introduction of follower rats from disrupting 
the ItMined discrimination ol the leaders. 

For testing purposes, )\\f) more groups of leadei rats were also 
trained. Om^ was a group ol t‘iglit albino r.its and the second was a group 
of eight black i its. Both groups were* trained to make a simple position 
r<\sponsr‘. Hall ol each group was traint‘d to go to*rln‘ lelt for lood, half 
to go to lht‘ right 

Sixte(‘n foIlowiT animals w(‘ie then trained and t(*sted in matched- 
dependent behavior. Hall were trained to lie imitators and the other hall 
to bi‘ “noninntators.” \ hsidiT r.it was plac(*d on thi‘ starting stem and a 
iollowiM was jdac'ed immedial<*lv behind him If tlu^ follower lat was 
b(‘ing I mined to imitat(‘, w lu'iievia he chose (he side chosen by the leader 
rat, th(‘ (‘.xperiiiKailer raised the lid on tin* run wav which exposed a food 
reward loi ll\(‘ lollower 11 the followaa was being traiiu'd to deviatr* from 
the 1< ad( I's behavioi. tlu* ri'waul wms (‘xposed onlv when he chose the 
sid(' opjiosil*' to that ol thi' hvidiT A trial was complete only w'hen the 
follower made a foriect (hone. If tin* follower made a wrong turn, both 
th(‘ follow'er and the l(‘ader w’ere restored to the starting stein and the 
process repeated until .i eoiieet (hoiee oeeurred. Tlu* training consisted 
of se\'en .such trials a day for fachiininud for 12 days, or a total of 84 
training trials. Only the fiist run in each trial w^as scored as being either 
e(>ireet or incorrect. Thus each aniin<d could achieve a score for the day 
ranging from 0 to 7. Since there were eight imitators and eight non- 
imitators, and ('ac h had .s(*ven trials per day, there w'cre 56 trials ' a day 
for each group. Tlic results of this training are shown in Figure 6 1. It is 
obvious that in the 12 days of training, the imitators had learnt^d to do 
whatever the leader did, and the “nonimitators” had learned to do the 
opposite. 

However, one could question whether thc' follower rats had learned 
to iiT\itate or counterimitate or had simply learned to make the same 
discrimination between thc cards that served the leader rats as cues. 
Even though 84 trials are usually in.sufBcient for an animal to make the 
black/ white discrimination, the follower rats may have been aided in 
making this discrimination by the presence of the leader rat. Indeed 
they may have, but the question is really whether they cm respond 
appropriately to a leader rat, when such a leader is the only available cue. 

Thnee tests of the follow it rats’ acquired tendency to imitate or not 
imitate were then arranged. In the first test, the black cards and white 
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Figure 6.1 

C un ( \ shninn^ i\u Jrarrnufl and t( of inntfjfion 
and "iiontnutation tn text for d( srnption of 

tfi(’ tJiK'o test pioadurts (Dtotin from Milln aud 
Dodutd, ('o})ijntihl l^)ij hy Th( Yah Viivti- 

sittf Fn ss Draan hi/ pamissiou of lh( authors aud 
pufdishci ) 


cards were removed, and new leadiT rats wen* sul)stitiited from the group 
of albino rats that had b(*en trained in position habits The lollowiT rats 
wore given seven test trials a day lor two days, being first placed b(‘lnnd 
left-turning and then right-turning leaders in an unpredictable' fashion. 
The only cue available for a correct response* in this situation wys the 
behavior of the leader rat. This test is designated test A in Figure 6.1. 
In test B, also consisting of seven trials a day for two days, left-turning 
and right-taming black rat.s were substituted for the albinos so that the 
experimenters could be sure that imitation and non imitation were not spe- 
cific to white leaders, lest C was carried out for two days with the same 
white rats used in test A, but with thirst motivation rather than hunger. 
The results of the three tests are shown in the right half of Figure 6.1. 
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There can be little doubt that the follower animals had learned to 
imitate or not imitate. It is true that during alf three test phases the 
follower ratJ were rewarded for correct choices and not rewarded for 
incorrect ones, anfl tests A, B, and C were therefore not unambiguous 
tests of generalization. However, the behavior of the two groups of 
follower rats continued to be quite different from each other when the 
only cue in the situation that could be utilized by the follower rats was 
the behdvior of the leaders. Thus their behavior was truly matched- 
dependent and reprtvsents a clear demonstration of a generalized tendencv 
to imitate or not imitate the behavior of others. 

Mill(‘r and Dollard ( 1941 ) also report a sirnilai studv carried out 
with first grade childr(‘n Two boxes were placccl on chairs in a room 
about ten feet from a starting line. A leatler was brought into the room 
and the experimenter pointed to oik* of tlu* boxi*s. Tlien tlu- follower was 
brought in. The leader then went to the indicated box, raised the lid, 
and took out a small piece of (andy. Tlu* follower was then given his 
turn. Followers w'f'n* arbitranU d(‘signatt‘d as imitators or noniniitators, 
and each found candy in the appropriate box d Ins behavior conformed 
to his designation. Imitators k‘arned to pc'rform perfectly with an 
averag(' total of 1.7 (*rrors, and nonimitators rnadi* an average of only 
.4 errors. A geneiali/ation test w'as then carried ont with four boxes In 
this te^t, 75' < of the 20 imitators chose the box ch()s(m bv the leader 
and none of the nonimitators chose that box. This study, w'hich w'as 
carried out with all of the controls necc'ssary, appears to di'rnonstrate 
that the paradigm for matched-dependent beliavior could also be used 
successfully with children. 

What are* the implications of such laboratory experiments for every- 
day human behavior? Imitation is a very general kind of response for 
wdiich there are highly available cues. In imitation, w\' may find a 
paradigm for learning generalized w'ays of behaving in a wide variety 
of everyday situations Tlu* leader who is interested in a fuller treatment 
of theories concerning the role of imitation in human behavior and 
other experiments with both human and animal subjects should see the 
extensive appendix of Miller and Dollard ( 1941 ) Some aspects of this 
experiment with first grade children are also di.scussed in Zajonc (1966). 

MODELING 

Modeling is similar to imitation in terms of the experimental arrange- 
ments, except that the emphasis of modeling is on the production of a 
novel response by the organism. Thus, in modeling, considerable infor- 
mation is given the organism to make the desired response likely. If one 
is dealing with a verbal organism, it is often possible through verbal 
instruction to induce that organism to perform a response that would 
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have a zero probability of occurrence without the instruction. However, 
when one is dealing with a uonverbal organism or when the circumstances 
are such that verbal information simply cannot be provided or is unde- 
sirable, modeling offers a means of providing a stimiflus containing far 
more information than is provided in any other learning procedure 
discussed in this volume. In the paradigm of modeling, the model pt»r- 
forms a complex response in the presence of the organism to be trained, 
and the modeled behavior ii rewarded when the response occurs. 

A number of demonstrations of modeling as a learning procedure 
have been developed by Bandura. He has made an effort to analy/(‘ tin* 
manner in which the modeled l>ehavior comes to be c'opied by the 
observer, and he has f^und reason to suggest revisions in tlu' nature and 
significance of some of the basic concepts of learning theory. 

When an observer copies the behavior of a model, according to 
Bandura (Bandura and Walters, 1964, Bandura, 1965), three possible 
means of bringing about the novel response in the observer may b(‘ 
involved: (1) The respon.se may be simply elicitrd m the observer. This 
occurs when the observer already has the complete response pattern 
immediately available. The effect of the model is only to facilitate* the 
occurrence of the response. (2) A new responst* pattern may be induced 
by strengthening or weakening response elements that art* already 
present, resulting in the emergence of a new pattern of behavior. ( J) The 
observer may acquire a new response that did not t‘xi.st in the n‘spon.se 
repertoire, although the new i espouse is certainly ( ornposed of previously 
learned or available elements. Thus the novel chanictiT of the* response 
stems from the particular combination of elem(‘nts. 

In one study, Bandura and McDbnald ( 196'1) undertook to demon- 
strate the superiority of modeling to the simpU* shaping piocedures of 
operant conditioning. In this study, young childri'u were given stories 
relating well-intentioned acts with drastic consequences and maliciously 
motivated acts with minor consequences. They were to judge which was 
the “naughtier thing." They^'cre then induced to make moral evaluativi* 
statements counter to their own under throe conditions. In one condition, 
they observi'd adult models who expre.ss(‘d moral judgments, and the 
children were reinforced for making similar judgments. In a second 
condition, they observed the adult mo(''*ls but the children were not 
reinforced. In the third condition, no models were provided, bu^ the 
children were reinforced for making moral evaluative statements exactly 
as they were in the first condition. This study is discussed in some detail 
in Zajonc (1966), but for pre.sent purposes, it is sufficient to note that 
observation of the model, with or without reinforcement, produced far 
more change in the behavior of the children than did simple reinforce- 
ment alone. Bandura argues that the method of successive approximation, 
as a means of shaping behavior, is inefficient compared to modeling. In 
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the metliod of successive approximation, the only source of useful infor- 
mation is the contingency of reinforcement. In rriocleling, the behavior of 
the model is a relatively rich source of inlormation for the observer. 

The use of modeling as a training procedure has suggested two 
revisions in the conceptualization of the learning process. These have 
been emphasized by Bandura ( 1965). Since there is both a model and an 
“observer” in the situation, one has the options of rewarding or punishing 
the model, the observer, or both. Rewarding or punishing tht‘ model, in 
contrast to the observer, is called vicarious reinforcement. One can well 
ask what edfeet vicarious roinforcement of the model has on the b(*havior 
of the observer, even when no reinforcement is appli(‘d to the behavior 
of th(' obstTver. 

A second th(‘oretical issue stressed by Bandura (1965) is the role 
of reinforcement in producing learning, as distinguished from its role in 
producing performance. Miller and Dollard (1941) are repre.sentative 
of a large group of learning theorists who make the assumption ^hat 
learning does not occur without the application of leinforcement. Inde' d, 
in operant conditioning, as promulgated bv Skinner (1938, 1953), rein- 
fori'cment is the only siguific.mt vaiiable that modifies l)ehavioi. Bandura 
disagrees with this position. In contemplating a situation in which a 
human subjec t is the passive rc'tipic'nt of verbal instruction or a situation 
in which th«' behavioi oi a model is piovided as a sonic t' of complex 
information lor the observcT, Bandura concludes that leiriforc ement is 
not nect‘Ssarv ha* Irarnino, The obscTver learns the response* without 
either peifoiining it cir being leinforced lor that performaiu c*. Bandura 
charac’teiizi's tiaditional leinlorcenient theory as as>nming that the 
oKservei suspends learning until the reinforeement arrives, how'ever long 
that delay is. Bandura conceives of reinforce merit as having an influence 
only on tlie perforuiancr or nonperformance of the response after it is 
learned 

Bandura (1965) has attempted to demonstrate both the efficacy of 
vicarious reinforcemcmt and the distinction between the role of rein- 
forcement in modifying performance and the role of reinforcement in 
modifying learning. In this experimental demonstration, three groups of 
children each observed film in w'hieh a model exhibited novel physical 
and x'erbal aggressive responses. Vicarious reinforcement was manipu- 
lated bv establishing three different conditions with three* different films. 
In one condition, the model was generously rewarded for its aggressive 
behavior. In a second condition, the model w as severely punished. In the 
third condition, no consec|uences w'ere .shown for the model’s behavior. 
After the children had observed the behavior of the model and the conse- 
quences, they were given performance tests for aggressive behavior. In 
these tests, the instances in which they performed imitative behavior were 
recorded. No reinforcement was applied to the behavior of the children in 
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tative behavior. The frequency of imitative responses increased in all 
three groups of children, with the largest inaease occurring in that group 
of children who^ observed the model receiving punishment for the 
aggressive behavior. Bandura argues that these children had learned by 
merely watching the film showing the behavior of the model. Vicarious 
punishment of the model had reduced me performance of the behavior, 
while direct positive reinforcement served as an incentive in increasing 
the frequency of the behavior. 

Bandura s demonstration of the effects of vicarious reinforcement 
seems convincing. The application of rewards and punishments to the 
model had effects on the behavior of the observers. Vicarious reinforce- 
ment appears to play a role in providing infornation and appears to 
operate as an incentive. However, Bandura's demonstration of learning 
in the absence of reinforcement, and thus by contiguity alone, will not 
appear convincing to a reinforcement theorist. Many subtle sources of 
reinforcement might have been operating in the situation other than the 
reward applied or not applied by the experimenter. Nevertheless, 
Bandura’s argument that learning may occur under direct verbal instruc- 
tion or from the mere observation of a model without the observer making 
the response— or the experimenter applying reinforcement-has sufficient 
merit to warrant consideration. 

Imitation and modeling are important because they illustrate pos- 
sibilities of extending the learning principles described in earlier chapters 
to very generalized kinds of responses that are applicable in many situa- 
tions and to situations in which a complex stimulus is needed in order to 
induce a correct response. Thus,"’ these principles of learning can be 
extended to accxiunt for everyday learning. To some degree, we have all 
learned to imitate and also to model our behavior after those who perform 
successfully. 
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There is a growing tendency among psychologists who work in the 
field of learning to employ mathematical models to summarize findings 
and to substitute for theories couched in nonmathematical language. This 
development is sufficiently extensive and important that the most elemen- 
tary treatment of experimental and theoretical developments in h'arning 
.should include an introduction to at least two or tliree types of math- 
ematical models of learning and to some of the language used in this 
context. 

A MATHEMATICAL MODEL AS A SUMMARY 
OF EMPIRICAL DATA 

Let us explore a simple illustration ot the us(‘ ol a mathematical 
model as a summary of findings. Suppose that we carry out a study in 
which a group of animals is induced tf) learn a complex rna/e problem to 
obtain pellets of food. We arrange to have the animals go without food 
for 22 hours before each daily training .session, and we use four small 
pellets as a reward on each trial. Let u.s choose to tak(' running speed as 
the measure. We might have chosen to use llu* number of (‘rrors, but our 
maze 4s a long and complex one. If an animal makes an error, it will 
increase’ the amount of time he takes to run the maze and thus reduce 
his speed. Furthermore, even alter he learns to run without error, addi- 
tional progress can be shown through turlher incri'ases in speed. We 
might choose to measure speed by taking the rec iprocal of the running 
time (1/timc in seconds) .so that the .sp<*cd score will increase as running 
time decrea.sc.s. 

The data collected in this manner will then consist of a number or 
scores. If we train 50 animals and run each animal on 10 trials on each 
of 10 days, we shall accumulate 5,000 scores. One way to liegin to 
summarize the data is to take the average score for each of the hundred 
trials by adding the individual scores of each of the 50 animals and 
dividing by 50. We have now reduced the 5,000 individual scores to 
100 mean scores. We have a record of the progress of the group in 
learning the problem. In the process we have lost the record of individual 
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animals. It is also frequently true that in av|[.Taging the scores, an 
additional assumption is made: that the Icaflrnini' process in each indi- 
vidual animal is ejse!»tially the same as that in any other animal. Thus if 
vve plot a graph of nu'an scon‘s by trials, the curve will be assumed to 
reflect th<‘ (■hara(‘ter of the learning process in each and every animal. 

Suppose that th(‘ curve we have obtained is somewhat irregular, 
rising in general, but showing some nwersal in its progress. In order to 
achu've a sinootln'i ( urvt*, we might choose to lake a mean of successive 
blocks of 10 trials, and thus a mean of .ill of the trials run on each day 
of the experiment. W’e shall now have reduced our 5,000 scores to 10 
values— <i very condensed surnmaiy. It is frequently true that in order to 
justify this last step, it is assumed that the irreguli rities in the previously 
plotted curve w(‘re due to the operation of chance factors. 

We now plot a curve such as that in Figure 7.1. It has only 10 points, 
and when thc'y are coniu'c ted, we observe that it is a fairly smooth ( urve 
that rises rapidly at first and then levels off. It is easy to conclude from 
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Figure 7.1 

Plot of mean speed scores from hypothetical experi- 
ment (see text for explanation) 

the shapt* of the curve that the learning process gums i ipidly at first 
and shows progressively less improvement as jiractice continues. Notice, 
however, that we have re h hed this conclusion on the basis of an abstract 
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of the original data. To> attain the le\el of abstraction represented by the 
plot ot the ten points in figure 7.1, theorists often make t \\'0 strong 
assumptions: (1) the learning process is the same i» all animals, and, 
(2) variation in scores can be attiibuted to chance factors. It is tpiitc 
possible that neither assumption is wholly justified Fiirtlu‘rmore, in the 
process of abstraction, we threw away a larg(' amount of information 
From the 10 means alone W“ could not recover the original data. 

After examining the curse (Figuie 7.1), we might want to try 
another method of reducing the mass of 5,(X)Q score's to a simple abstrac 
tion. We might decide to fit a cuive to the 10 means 1)\ oiu' of several 
curve-fitting procedures that aie only slightly more t'omplex than the 
process of obtaining nfeans across animals .md then means across bloc'ks 
of 10 trials 

The first step is to find a tvpt' of e(|ualion that fits tin* general pat- 
tern exhibited bv curve's we obtaiiK'd b\ the' fiist process \ number of 
etjuations might be used in fitting the* data. One of these is a simple 
exponential (*(juation. We* might se'lett this particular t*ejuation partlv 
because it happe'us to fit rnari\ sets of dat.i e>l)taine*d bv me'asuring the 
proee'ss of physical growth. In fact, it is some*times rele'rred to as the 
“growth ecjuation ” One* lorm eif expoiu'iitial e'ejuatiein that proves to fit 
the data fairlv w-ell is the following 

Z x(l-10“M (1) 

The relation of the various values in the eejuation to the learning pixx ess 
as shown in the ( urvi* we obtainc'd before- may also be seen m Figure 7.2 
Z IS the measui'c of performance, in* this c.ise*, spea'd T is the* eirdinal 
number of the trial. Both Z and T are e'mp’ncal variables for which we 
obtained values in the original experiment. The x in the ecjuation is a 
value obtained in the curve-fitting procc'ss, and it cxpri'sse.s the* maximum 
performance w'e might expec t to obtain if training was to be carried on 
indc^finitely. Thu.s x is the asymptote, the maximum value of the* curve at 
infinity— a theoretical limit the curve constantly approaches. 1'he u in 
the ecjuation is also a value obtained in the jxroc-css of c iirvc'-fitling. It 
expresses the rate of learning or, in othc'r w^ords, the rapidity w'ith which 
the curve appioachcs the a.symptote. It the fi action rc'maining of the 
distance to .r that w'ill lie gained m a trial If u is constant, the* gain* will 
be smaller on each successive trial because the remaining distance to the 
asymptote, x is smaller. The operation of u is indicated in the figure only 
indirectly by the constant proportional but different absolute amounts of 
gain between the first two data points as contrasted with the gam 
betw'een the second and third points. If u is large the curve will ri.se 
sharply at the beginning. If u is small, the curve will appear flatter and 
will approach the asymptote more slowly. The 10 in the ecjuation reflects 
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the fact that logarithms to the base 10 are to be employed in the curve- 
fitting process.' 



Graphic form of Equation 1 (see text for explanation) 

It should be noted that u is firmly identified with learning and x is 
firmly identified with performance. Learning is defined as the change in 
performance that occurs with experience. In expressing ultimate per- 
formance, X should be identified with those variables in a learning prob- 
lem that remain constant. Whether some experimental manipulations 
change ii, x, or both, remains an empirical question. 

If we fit the data of F’igure 7.1 to Equation 1 to obtain appropriate 
numbers to represent x and u, the result might be the equation in Figure 
7.3. Since we have obtained values for x and w, it is now possible to 
choose any value we wish for T, then solve for Z. We can calculate a 
number of Z values and plot a smooth curve to represent the learning 
process. The result will appear as the curve in Figure 7.3. The curve and 
the equation are both simple abstractions from the data, and as such are 
simple summaries of the data, very much as the plot of mean values in 
Figure 7.1 is an abstraction and summary. 

' For a proKrammed mtroductioi. to common logarithms, see Lane and Bern, A Lab- 
oratory Manual for the Control and Analysis of Behavior, 1965, in this series. 
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Exponential equation reikuiiing from fitlinff a curve to 
the (lata plotted in Figure 7.1 (sec text for ex plana- 
iton) 


It should he noted, however, that in choosing the exponential or 
growth equation, we have made at least one further assumption about the 
learning process. While it may not l>e obvious, we have made the assump- 
tion that learning is a continuous j/iocess rather than a disci ete one. 
We have borrowed from the calculus the underlying assumption of 
continuity. From the equation, it is possible to calculate the performance 
on trial 17.6 even though .6 cf a trial makes no sense. The equation 
permits us to extrapolate the curve back to zero training or even beyond 
to negative values of training. In es.sence, in choosing this equation form 
and in making the relevant assumptions, we have denied the possibility 
that learning proceeds in an all-or-nothing fashion or that it proceeds in 
finite and discrete steps. Alternative equation forms might have been 
chosen that involve quite different assumptions, one of these alternatives 
will be discussed later. 

One may regard either the formal verbal statements that result in 
the formulation of Equation 1 or the equation itself as a mathematical 
model of the learning process— it is a summarizing model to the extent 
that we use it only to summarize a .set of data. But it is rarely used as a 
simple summary. More commonly the mathematical assumptions involved 
in summarizing are transferred to the learning process itself. We have 
made at least three very strong assumptions. ( 1 ) that the learning process 
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is essentially the same in all .30 animals,’ (2) that irregularities in progress 
can he attributed to random or chance factors, and (3) that the learning 
proc(‘ss is continu(Jus (sometimes referred to as inerem(‘ntal j rather than 
discrete. It should he noted that verbally expressed learning theories 
Irequently involve similar strong assumptions that are not clearly 
expressed- and an' fre(|uently unrecogni/ed. Thus, one of the virtues of 
building . rn ithematieal models is that assumptions Ireijuently become 
mon* explicit. 

DEVELOPMENT OF THE DETERMINISTIC MODEL 
INTO A GENERALIZED, PREDICTIVE AAODEL 

A part ol the usefulness of a mathematical model will bt* di’termiiHul 
by its generality. TIutc ar(' two w'ays \v(‘ can proceed to (*xplore the 
generality of K(|uation I as a general model of the learning process. 
Since it is only a summary of the data from one expi'rirnent, we may 
wish to prove that th(‘ values we obt<iined are leliable estimates. VVe 
can r<‘peat the identical expc'nment a numlH*r ot time's, fitting curves to 
each iK'W' set of data until w'e become convince'd that our values are 
reliable' estimates ol tlu* rate* of learning and the ultimate pe'rforrnance 
to be* (M'peete'd This procedure* is not tiivial, but it is uninle'resting. 

A meire* interesting path te) ge*neiaht\ is to explore* what happens to 
X and tt w'he'ii we* doliberatelv elumge* the nature oi the e'xperiment, or 
vary one* ol the eeiiiditions ol learning. Suppose that we decide to run 
another group e)l 50 animals w ith the conditions exae'tly the same except 
that inste'ad ol Ix'ing deprive'd e>f food for 22 hours during the training, 
they are deprived for einlv three hours and can therefore be assumed to 
be less hungry. II \\v then ('inploy the new set of data, we might obtain 
tfu* lowrr curve showai in Figun* 7 4. The most obvious dif^rence 
])etween the two ciirs'C'S is the big differenc'e in running speed that the 
two groups seem to have adopted. This diffeienee is reflected in the 
difference betw een .086 and .045 as estimates of x. It is obvious, therefore, 
that the difference in deprivation level between the two groups resulted 
in different levels of performance. ()n the other hand, the fact that the 
two estimates of u, reflecting the rate of learning, are essentially iden- 
tical seems to establish that the difference in deprivation level did not 
affect the rate of learning. Furthermore, the estimate of the constant at 
the end of each equation is the same, and this has another interesting 
implication. Since the constant, in this case .016, indicates the point at 

’An assumption even more firmly uroiinded hv the fact that if the animuls learned at 
different rates, curves of individual performance could he fit by exponential equation.s, 
but a curve that is an average of cvoonential curves with different e.xponent.s is not an 
exponential curve 
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which the cunc readu's zero on the abscissa— a point which ulk*cts 
running speed be tore there •is an\ training— iht idcMitits ot tins xahu in 
the two equations implies tliat initial lunnmg spc*^jd did not dilFer 
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between the two It is thus implied that tlu dilhrciue in deprivation 
level did not produce a dilhiciue in general «tetivit) or, ineirc strictly, 
the tiryc taken tei get Ireim the start to the geial box bcleire the animals 
Intel expciienccd the foeiel incentive 

While data on which Figure 7 I is based art purely hypoth<*lical 
and the e tpiations in the figure aic I.ibnta^ieins the results are* ne^t at all 
unlike 1\ 1 urthennore rather than vary the time of ele jii ivation we 

might have chosen to vary the amount of reward In the ejrigmal exptii 
ment wc dtsei bed conditions of trail n ig as involving 22 hours ol 
d(*privatu)n of foe^d and lemr small pellets reward on each trial* We 
might have run still a third group with the same time erf deprivation 22 
hours, but witfi only one small pellet as a reward on each trial It is quite 
perssible that the data would be fit bv the following values m the 
equation 

7 058 (1-10 b 016) 

If we had actually obtained such values and again the result is a quite 
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possible one, we would have noted that both the* origin of the curve and 
the rate of approach to the asymptote did* not diflfer materially from 
similar values in figure 7.4, but that the estimate of the asymptote, .058, 
was different. We should then conclude that variation in the amount of 
reward would produce variation in the ultimate level of performance, 
but would not change the rate of learning. In terms of Equation 1, 
variation in deprivation level and amount, of reward both produced 
variation in the estimates of x, but neither produced variation in u. 

What started as a simple effort to summarize the data in equation 
form has already developed into a primitive predictive model and into a 
theory of learning as well. Before attempting to specify the characteristics 
of Equation 1 as a predictive mathematical motlel, it is important to 
distinguish the various languages which have been and could be used 
to talk about the various aspects of the situation. 

One language set is, of course, that of psychology. Terms such 
as learning, rate of learning, motivation, performance, and reward are 
included in the basic language of psychology. The language of psy- 
chology should be set apart from the languages of experimentation, logic 
(or philosophy of science), and mathematics, which are ancillary lan- 
guages used to talk about psychology. 

In the language of experimentation, the number of trials, the hours 
of deprivation, and the amount of the incentive are all independent 
variables, while the running speed is a dependent variable. The first three 
terms specify the arrangements of conditions we made to put the question 
to nature. The dependent variable, running speed, is nature's answer. 

In the language of logic (or philosophy of science), the Z and T of 
Equation 1 are empirical or observable constructs; capital letters are 
used to indicate this status. The T is a manipulated variable, because it 
is under the direct control of the experimenter. Z, on the other hand, is 
a performance variable because it is the empirically measured per- 
formance. The u and x of E(juation 1 are e.ssentially theoretical constructs, 
and most often are intervening variiibles. They are printed in lower case 
to distinguish them from more directly measured empirical constructs. 
Figure 7.5 shows these relationships. Solid lines in the diagram indicate 
that the necessary connections between empirical and theoretical con- 
structs, namely, appropriate coordinating definitions, are implicit in the 
equation. It is also true that the equation form establishes the necessary 
implicit definition which relates x and u. Further discussion of the lan- 
guage of the philosophy of science can be found in another volume in this 
series (Walker, Psychology as a Natural and Social Science, pending). 

In our discussion of the hypothetical experiment, two values of x 
were established on the basis of two values of time of deprivation of 
food, three and 22 hours. Furthermore, it was assumed that we mi^t 
expect to obtain two valuer of x with two diflFerent amounts of food 
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reward. If we designate*. H to stand for hours oi deprivation, and R to 
stand for the amount of rewitrd, it would be possible to write quotations 
which related H and R to x and Z in which u was fountj to be unaffected 
and thus a constant. Only a linear equation could be fit to either, since we 
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Figure 7.5 

Diaper am showing the conrspomlencc between the 
lanpuage of expcrimentatwn, the lanpnage of logic 
or philosophy of scu nce, and the variables of Equa- 
tion 1 (see text for explanation). 

had obtained only two values for each variable. Obtaining data for other 
values of H and R would improve our knowledge of the shape of the 
relationship and our confidence in statements concerning the shape, but 
would not change the character of the logical relationships between 
empirical constructs and their associaltjd theoretical counterparts. 

If Equation 1 is to be used as a predictive mathematical model, then 
the discussion is to be carried on in the fourth language, that of mathe- 
matics. Jn that language, T, Z, x, and u are all variables because they can 
assume a variety of values. In addition, x and u are called parameters. 
Equation 1 might have been written with the value f r as a lerininal 
member. It is a constant in the curve-fitting process as the value of the 
point at which the curve crosses the ordinate. 

While Equation 1 as a summarizing mcKlel started with a set of data 
and continued an abstraction from it, the .same equation as a predictive 
mathematical model is an abstraction from which one predicts data. 
The interest in mathematical models of learning arises from a conindera- 
tion of the mathematical properties of the model itself. The model is 
taken as fixed, and the effort is to generalize it as far as possible without 
giving up its essential character. As indicated previously. Equation 1 can 
be generalized in two or more ways. The experiment can be repeated and 
the curve-fitting process carried out repeatedly until one is confident that 
the obtained values for x, u, and c are stable values. The curve may be 
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extrapolated to values of T that were not tried, such as 200 or 1,000 
trials. Then predictions may be made frorh the model concerning the 
values of Z, the performance, tor these values of T. Finally, empirical 
tests of the predic tion may be made. A third procedure would involve the 
manipulation of other (‘inpincal constructs. We discussed the manipula- 
tion of time of food deprivation and of the amount of the reward. Both 
were found to affec t x but not u. Both findings increased the generality 
of the model, but because they gave a satisfactory fit to the clata, thc^y 
did not force a change in its charac*ter. This process might be extended 
to other values of tliesc' variables, or to the manipulation of new variables, 
such as difficulty of the pro])lc‘in whcTt* we probably would have found 
that both X and u would have b<‘(‘n affected bv tlU* same manipulation of 
an I'mpiiical varialile. The modrl, hossewer, would have remained 
inviolate. 

Difficulty with this simple model would hav(‘ been cncountcTcd if we 
had happenc'd to try a procrdiire viedding results that could not be fit by 
a simple' ('xpoiu'ntKil (‘(piation Foi example, suppose that we had 
actually tc'sted tlie modc'l at 1,(KK) trials and found that, instead of 
approaching more and more closely to the* asvmptote, the curve bc'gan 
to fall. Or suppose' wc* constructed a difficult problem and found a curve 
that rose' to an apparent inaxiinum value and then, after an extended 
perioc^of ik» apparent change, began to rise once again, Only them would 
it have* bc'C'ii nc'cessary to reconsidc'r the* nature of the model itself, since 
such results could not be fit bv an c'xpoiic'ntial equation 


THE STIMULUS SAMPLING' MODEL 

The* iirc'gularitu's in the behavior of individual animals and their 
unsteady progress toward thc'ir ultimate performance levels were a;jsumed 
to be atti ibut.ible to uncontrolled, random, or chance faclois and were not 
taken explicitly into .iccount in the dcirrministic model. To mathematically 
oriented people, the irregularities suggest the operation of prohahiHstic a.s 
oppo.sed to deterministic processes If one vic'ws the irregularity in 
pcTformancc' as the' basic chai actc'ristic of the bc'havior, then one might 
prc'fer to choose a stoclui.stic or probabilistic model. Stochastic models are 
statistical in charactc'r and involve unceitainty in outcome as opposed to 
the c'ertainty involved in deterministic models. 

A prominent example of a stochastic learning theory is the stimulus 
sampliti^ theory of Kstes (1959). This theory makes use of the mathe- 
matics of set theory and probability theory in combination. While 
Fstc's' theory, in its mathematical form, is complex and highlv developed, 
a few simple elements of it ^'an be extracted to demonstrate some of the 
characteristics and propertn ' of stochastic models. 
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The \erbal statement of the esseiui of L\Us U innnj; tluor\ is 
simple It IS the tlal)oration«of tin. stotlustic mocU 1 tint Incoinis com 
plex From the standpoint of tlu model tlu stimulus uniscist c in lx 
conceived of as composed of elcmcntd stimuli Tlu rt sponst univcisc 
mav likewise be conceived of is i liii;e popiil ition oi clenxnts both 
stimuli and responses are thought of in teims of sets md sublets oi 
elements with the connections lx tween them described m piob ibilistie 
terms Both le lining (the chuu^e in eoiiiie c lions between stiimfliis e le 
me'iits inel i response) incl peifoimuiee (tlu piodiictioii eil i lespeuise 
on inv given trnl) lie clesciibtcl m teims of luob ibihtie s I ste s the orv 
is built on five simple peistiilites oi preipositieiiis vhieh iie olivioiisls 
chosen he ( uise of the m ithe m itic il propel ties be phi ise el fioin 1 ste s 
(1959) thev arc 

be speuise Ihe e iitiie iinoeise of le spouse s i eh\ ide d into rnntii illy e se lii 
sue mkI e vh II sti\c e itc goi le s \ leheitegoix is de se nhe d is i siilise I iiid 
is elui ictMi/ed is h iving a pioh ihilily ofoeeuiunce the pioh ihilitu s 
( f tlu siihsets siini to 1 (H) 

Stimulus R( sponsc Rdatums The stiniulu set i pulitiomd 1 )\ tlu 
lespexists to winch the elements ne iltuhed Ixeiv element of tlu 
stimulus set is itticheef to one ind oiilv one iespons< Piohihility of 
ies[)f)nst IS eejud to the piopoitioii eif tlu elements lUulud to tlu 
le spouse in epic stiein 

I larntnp^ All elements pic sent ,011 i tiid on whuh tlu ies[)onse oeiiiis 
inel IS leinforceel ue thereby eoniueteel to tlu le spouse If (he element 
IS nejt present eir if no le mfoie e me nt oecuis tluie is no le nnint, nej 
change in connection 

Stimulus 1 luctuatwn ( 1 ulu il le lining mel lespense \ niihihty n l)e)th 
Kcounted foi in teims of the stiternent th it on'v 1 putiem e)( the stimuli 
which might be simpled is utuilK simjiled on i given tiiil 1 nitlui 
moie the exchinge between the set simpled ind the set inome nt iiih 
(in IV (liable for s impling leejuiics time 

Stimulus Sampling lo cveiv element of s is issoe 1 ite d i pioh ihility eil 
being sampled, d 

In I stes model the definition of le infoice me nt is iinplieit in the 
postulates hor e ich response cliss then xnts i c 1 iss of events which 
arc reinforcing There is no independent definition of le inforce me nt 
Whichever event is founel lo mere ise the prob ibilitv of the e voc itiem of a 
response in the presence of a given stimulus set is by this definition i 
reinforcing event 

The full stimulus sampling model as described in 1 stes (1959) has 
at least 19 viriibles or theoretic d teims \ minimum of eight is ncc 
cssary to develop a simple foiimih foi i le irning curve simihr to 
Lepiation 1 Here arc the eight terms 
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S the total set of stimulus elements. 

s - the mean stimulus sample size. Thus s is a subset of S and a rec- 
ognition of (the condition that only a subset of the total stimulus 
set will be present on any given trial. 

N the number of elements in S 

— the probability that the ith element (any given element) will be 
sampled on any given trial. 

A, = a set of r mutually exclusive and exhaustive response classes. 

E; ~ a .set of reinforcing events, one [event] corresponding to each 
response class. 

n the ordinal nunibei of the tiial 

l>„ the probability of A, on trial n in the presence of the stinuihis sit- 
uation, S. 

Ill order to generate a smooth curve of the kind which is character- 
istic of Equation 1, it is necessary to make several strong, limiting 
assumptions. Some of the differences between the deterministic and the 
stochastic models may become clear from this process. One assumption 
might be that all B values are the .same. B, is the probability that a given 
stimulus element will be sampled, and the probability of sampling can 
differ between elements. However, if we make the assumption that all 
elements of S are equally likely to be sampled, the B will be equal for all 
elements. If, in addition, we deal with mean values of 6. then the size of 
s will be precisely determined by B. Since N is the number of elements 
in S, then 

s^NB. 

An even stronger assumption mav be made concerning p. VV^hile $ 
concerns the size of the sample of S drawn on each trial, /; represents 
the proportion of that sample which is associated with A; through 
previous reinforcements. To generate a sn)ooth curve, it is necessary to 
assume that the proportion of the sample drawn on each trial that is 
connected to A, is identical to the proportion of the total population, S, 
that is associated with A,. Given these assumptions. Equation 2, 

( 2 ) 

can be used to generate a smooth curve. It should be noted that because 
of these restrictive assumptions, the lefthand term of the equation could 
have been either NB or s, since both have been made equal to p by our 
assumptions. If we choose .5 as a value for B, for the sake of simplicity, 
then a series of values can be calculated for p by solving Equation 2 for 
a series of values of n. Figure 7.6 is a plot of values for p through five 
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trials. It can be seen that in this instance the curve that is generated 
has properties that are .identical to the curve generated by Equation 1. 
Because the curve rises to an absolute limit of a prol^ability of 1.0, the 



Figure 7.6 

Graphic plot of solutions for Efftwtion 2 for six i alucs 
of n (see text for explanataipn) 


value J in Equation 2 is an asymptote just as x described the asyinptot(‘ 
in Equation 1. The rate of rise to the asymptote is determined in Equa- 
tion 2 by the quantity (1 -- which decreases as n incrca.ses, changing 
the value of p„ by a fixed fraction of the remaining distance to the 
asymptote just as the curve generated in the case of Equation 1 proceeded 
with successive units of training by a fixed fraction of the remaining 
distince to x as defined by the exponent, uT. It should be noted that 
Estes' model can be developed to deal vith other dependent variables, 
such as latency of response, in which the asymptote is free to vary with 
the conditions of the experiment and is not arbitrarily limited, as is true 
when probability of response is the performance term. 

The fact that the two equations are mathematically identical arises 
from the restrictive assumptions that were made. Without them, Equation 
2 describes not only the mean values of the learning curve, but the dis- 
tribution of values to be expected. The increase in the proportion of the S 
population associated with the response would depend on the number of 
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uiiasscKiatcd okiiunts that happen to 1 )<‘ in the particulai sample of 
el(*rn(‘nts, s th(‘iin<)i(‘, the perlorin.ince fvoiild not nctessarily he an 
ex.id leflection oHhe amount of learning If a paiticiil.u sample set was 
nnR*pies(‘ntati\e of tlit piopoition of leain<‘d connections the pciform- 
ance might reflc'ct a piopoition that is gic.itc-i oi lesser ih.in the actual 
proportion of connc*cted cIcmcMits Thus without the restricti\(‘ assump- 
tions, a cm VC* plotted fiom laju.ition 2 would rise iiiegulailv unless a 
sufficient n\imlu‘t ol c .ilc ulations wcic* made* from the* ecjuation in a 
m.innei C'C|in valent to running a large* numhc'i of animals so that neither 
the late ol learning noi the degicc* of rc*pi(*si nt.ilion of learning in 
pc*i foi m.inc (* chp.nted honi a mean s.ilue* It should he* cle*ai Irorn this 
discussion fhal Jsste s stiiimhis sampling inode*l n.. ike s a ele.n distinction 
h(*tw(*c*n learning .is r(*prescntc*cl h\ H and p(‘rloi mance*, as rc^pre- 
se*ntcdhv p 

The difleie nee* he'twee'u learning and p(*iloi mance in the* Estes’ model 
can he* luithe*i e*\pancled h\ noting th.it while* le*.iining may piocec'cl in 
one* diiection only peifoiniance c.in show ic\e*is.ils ol diie’Ction This 
conditiein lollow's in p.iit liorn the* peistiilate* that connections remain 
line h.iiige'cl il there* is no ic inleiic e*in( nt Thus m a le*.iining situ.ition, the 
numhei ol ele*me*nts ol S eonnc*cteel tei tlie re*sponse in.iy increase* m 
\ar>ing amounts from tii.d to tii.il depending on the niirnhe*! ol iincon- 
lU'ctecJ ele*ments in tlie sample s But th(*\ m.iy not dec lease To the 
c*\tc*nl tha' the* paiticnkir sample s nnde*ie'stnnatc*s the proportion of 
elements in S that .11 e conne*cte*d to the ie*s*ponse theie* e*\ists the peissi- 
hihtv ol .1 chop ill the peifoiniance curve 

It shenild .ilso he* notc*d th.it a cnive* ge*nerate*cl liom Ecjii.ition 2 can 
dep.iit horn the* simple e\pone*nti.il loim m .1 numher of ways One 
example might involve the* assumptiem th.it some* e*lemc*nts of S are 
ie.idil> avail.ihle* loi sampling while othe*is ,11 e* maikedlv less av.nlahle 
The* le'sult wemhl he* .1 pc*i for m.inc e curve* th.it use's sharply and then 
pi(Xe*eds eive'i the le in. lining distance at .1 le*isiiie'ly p.icc* The* same curve 
could he* fit with an cxponeiili.d c'C|uatie)n onlv hv dc'veloping a more 
complex expression for the exponent 

As statc*d ear he 1 a basic and fundamental chlfc'ie‘nc<‘ hetwc'en the 
two models that have hc’cn discussed is the* fact that the dclcrministic 
modc'l assumes that the leaiiimg picKes>* is continuous while the stochastic 
mode'l of which Este's’ stimuhis sampling mcxlcl is an example, permits 
leainmg to picxeed m discic^te steps A deterministic model preclic'ls 
mean value's while tlie stcxh.vstic modc'l descrihes the distrihution of 
pc'rfoi mance scores to he ('xpeclcxl as well 

Tims under ceilam ic'stiictivc’ assumptions and with a l.irgc* iinmher 
of elements, Estes' modc'l can ippc*ai to hc'liave as dex's a cl(*tr*rmimstic 
mcxlel llowevt'r, Estc's’ ip >d('l can he made to deal with many kinds of 
data that would he* clifTiciilt for a deterministic model. Furthermore, the 
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development and application of such stochastic models have led to 
ingenious experimentation. ** 

It is not a characteristic of mathematical models oj learning, as such, 
but it is a characteristic of psychologists who build mathematical models 
of learning, that the model tends to bec'ome more important than the 
data. One can become quite interested in mathematical models o( learning 
in a general sense and expend great efiort in developing their properties. 
This interest can be accompanied by varying amounts of interest in data 
ranging from none at all to sufficient interest to prodiic-e strong programs 
of research, such as is the ease with Kstes and Ins students. However, 
when theori(‘s are cast in mathematical terms, then' is a g(*neral tendency 
for the relationships hetwe^en theoretical and iMiipirical varialiles to be 
somevvliat less formal than is likely to he thc‘ case with nonmatlu^inatieal 
learning theories. 


MANIPULATED INTERVENING PtRfORMANCf 

VARIABLE VAWABlt VARIABLE 


NUMBER OF FOOD REWARDS -- 

NUMBER OF UNCONDITIONED ... 
STIMULUS PRESENTATIONS 


INTENSITY OF 
CONDITIONED STIMULUS 




^ e p 



N 


RELATIVE FREQUENCY 
OF RlSPONSF 


Figure 7.7 

Duifirani of .sonic of tlu tfjtcnclathnships Jxtuccn 
thcorcticdl constructs and the data In cl tn llstcs' 
.stimulus sanijilinx thcoiif (see te t for explanation) 


Figure 7.7 is a diagram of a few of the conc(*pts we have just dis- 
cussed. Estes* theory has no empirical constructs as such. The four 
symbols in the box, as well as the remainder of the 19 symbols listed in 
Estes (1959), are all theoretical variables. The solid lines in the diagram 
represent the tact that the implicit definitioiis Ix^tween theoretical terms 
are completely specified in the mathematical ridationships indicated in 
the formula employed by Estes, of which Ecjuation 2 is one example. As 
indicated in the diagram, n may at one time be inttirpreted as referring 
to the number of food rewards, and at another time as referring 
to the number of unconditioned-stimulus presemtations. Manipulation of 
the intensity of the conditioned .stimulus may simultaneously increa.se the 
number of elements in S, in N, and in the sampling probability, 0. Thus 
the tendency of many theorists is to interpret the model rather directly 
in terms of data rathc^r than to indicate formal empirical constructs and 
coordinating definitions between the constructs and the theoretical terms. 




134 


CONDITIONING AND INSTRUAAENTAL LEARNING 


As is always the case with probabilities, p is an unrealized potential, and 
thus a theoretical term in the language of thebry. Probability of response 
is usually evidencefl in terms of frequencies, and the relationship between 
p and frequency of response is also a matter of “interpretation” rather 
than formal statement— it appears generally true that the relationship 
between a mathematical model and the performance variables relevant 
to it are ^o more rigorously stated than the relationships between the 
theoretical terms and the manipulated variables. The general weakness 
of relationship between theoretical terms and empirical variables is rep- 
resented in Figure 7.7 by the broken lines connecting the two classes 
of variables. 

COMPUTER SIMULATION OF LEARNING 

Machines can be induced to simulate the l(*arning process. All that 
is required is to provide a set of operations which will result in a per- 
formance that has the essential characteristic s of a learning performance. 
A number of devices that appear to learn have been constructed to 
perform simple biological-like lunctions. One of the most amusing is a 
“turtle-like” device described by Walter (1951). An earlier version of 
this machine had been constructed of a tortoise-shaped shell containing 
two rnttors, a photoelectric cell, two vacuum tubes, and a touch contact. 
The circuit provided made it seek light of moderate intensity and avoid 
both strong light and physical objects. Tlie later version contained 
additional circuitry to provide for simple learning, and Walter refers 
to this model as Machirui docilis. A learning machine may be regarded 
as a specialized computer. One po.ssible benefit from the stuntlike effort 
to construct such a machine is detailed in Walter's report. He wished to 
build an analog of simple conditioning. Since the machine was attracted 
to moderately intense light, he wished to induce it to be attracted to a 
whistle by lepeated presentation of the whistle just before moderately 
intense light was presented. In order to accomplish this simple effect, he 
found it necessary to provide seven distinct operations. ( 1 ) The beginning 
of a specific stimulus had to be sharply distinguished from the absence 
of the stimulus— it was the change in the .stimulus that was important 
rather than its presence. (2) The effects of the neutral stimulus had to 
be extended in time— the machine had to remember it long enough to 
make an association. (3) The relevant stimuli had to be mixed in such a 
way that simultaneous presence of effects could be detected. (4) The 
effects of the two stimuli had to be summated or integrated to form a 
(xinsolidated stimulus. (5) The circuit had to be designed to activate 
memory only when the frequency of paired occurrence of the stimuli 
exceeded some value of chance coincidence of the two. (6) Memory was 
programmed to decrease in strength with the passage of time. (7) Pro- 
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vision was then made for the necessity of testing the correlation l>etween 
the two stimuli in comparison with other stimuli. Each of these seven 
requirements was met by tlie construction of a circuity element to carry 
out the function. The circuits Walter used are illustrated in the original 
article (1951). A major benefit of such an effort is apparent in the list of 
the seven necessary characteristics. It was necessary for Walter to be 
highly analytical in designing a machine to perform what had appeared 
to be a simple learning process. Some of the complexities of conditioning 
became evident only when he tried to simulate it mechanically. 

However, it isn’t necessary actually to hfiild a machine to simulate 
a learning organism. A calculator or computer can be used to make the 
simple arithmetic calculations nec(‘ssary to determine the values of an 
equation without the specialized machinery needed to construct a 
Machina docilis The construction of a c'omputer program can be expended 
to force the writer of a program to be as analytical anil specific about 
the nature of the process to be simulated as the construction of a learning 
machine forced Walter to bo. 

Computer simulation of psychological processes, such as learning 
and performance, can generally pro<'eed in either of two ways. Oiu' can 
construct a mathematical model, usuallv stochastic, and use the computer 
to make arithmetic calculations; or one can construct an information 
processing model based on more flexible utili/ation of thi* compiitiT s 
capacity to handle logical operations. Cregg and Simon (1966^ refiT to 
the latter alternative as the construction of a “process model.” \n 
exploration of process models will not be unilertakiui here, although it 
is possible that they mav come to be the most fruitful employment of the’ 
computer in simulation. 

One essential value of the coinputc*r as an instrurm nt of simulation 
arises from its enormous sp(‘ed of operation. In a matter of se(‘(uids, it 
tan perform calculations which might take days if done by hand. In 
f«act, the speed of the computer makes tasks of computation practical 
which would not even be considered if th(*y had to be done without 
the computer. 

For the purposes of illustration, the very simple computer model 
of learning was programmed and run on a large digital computer by 
S. H. Robinovil'. The program was devi. ed to simulate rats learning to 
go to one side of a simple T maze for food reward. The tendency Co go 
to each side was assumed to be equal, and thus to have a value of .500 
at the outset in terms of probabilities. Since the behavior of rats in a 
T maze is variable, it was ncce.ssary to build variability into the prob- 
abilities of going to either side. It was arbitrarily decided to subject the 
momentary value of turning to either side to variation with a standard 
deviation of .200. Standard deviations were used because of the avail- 
ability on the computer of a subroutine for calculation of the standard 
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deviation which could be used as a part of the input, and a value of .200 
was chosen to permit occasional errors eveik after the tendency to go to 
the rewarded side was quite strong. In essence, the computer was asked 
to make a random selection of a value for each tendency on each trial 
from a distribution of values around the mean value of each tendency. 
Then the computer was asked to decide, on the basis of the higher 
value, which way the animal might be expected to turn. Using this 
procedure, the animal would have made equal choices to the two sides 
over a long run if nothing happened to change the value of either 
tendency. 

Side 1 was chosen as the side for the computer to reward. For the 
sake of simplicity, it was decided that the computer should learn nothing 
from an incorrect choice. Therefore, it was instructed to make no change 
in the value of the tendency to turn to Side 2 when Side 2 was actually 
chosen. On the other hand, the computer was instructed to reward the 
rat whenever it chose Side 1. In order to represent the effects of rein- 
forcement, the computer was instructed to add to the probability of 
choosing Side 1 an amount to be calculated. This value was found by 
taking the difference between the existing value of Side 1 tendency and 
1.0 (the asymptote), multiplying it by .015, and adding the product to 
the Side 1 value. This operation is identical to the process of increasing 
the vi^jue of performance in Equation 1 (Figure 7.2) and Equation 2 
(Figure 7.6). The value, .015, should produce learning at a rate somewhat 
slower than that pictured in Figure 7.4 in which the relevant values were 
.0180 and .0185. Thus, each time the computer actually chose Side 1, the 
value of the tendency to go to Side 1 was increased at the relative 
expense of the value of the tendency to go to Side 2. 

COMPUTER LANGUAGES 

Communication with a computer involves the use of mutual lan- 
guages, languages that can be understood by both the operator and the 
computer. The “operational core” of the computer is able to deal with 
only two symbols, 0 and 1. In order to make calculations in terms of 
decimal numbers, it is npees.sary to translate one number system into the 
other. In order to make it easy for the ordinary user, translating pro- 
grams have been written which perform this function routinely. Further- 
more, many of the operations the computer is asked to perform are 
needed repeatedly. Therefore, a library of operational instructions has 
been developed which the computer can be asked to use on the basis of 
a simple request without going through the details of writing the instruc- 
tions each time someone wants a given and common operation performed. 
The symbols that tell the computer what to do, including the symbols 
asking it to perform common operations, form a language that the com- 
puter understands perfect!}' and that human operators can learn to use. 
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The simple learning model detailed al>ove was programmed in the 
MAD language. MAD stailds for Michigan Algorithm Decoder. An 
algorithm is a set of rules for solving a problem. The MAD language is 
not too difficult to learn and is a general computer language rather 
widely available. 

Table 7.1 is the program in MAD that describes the problem detailed 
above. Some of it may look strange because it comes so ('lose tojiormal 
iLnglish. However, there are diflerences. Some of them are. there* ar** no 
lower case letters in MAD, .spellings that appear odd have a function, and 
nonalphabetic symbols tend to mean very different things to tlu* c'omput(*r 
than they ordinarily mean on a printed pag(*. The program was included 
here only to illustrate w)^at one lookc'd like, and the readi'r is not (*xp<*clt*d 
to understand it unless he learns the MAD languagt* or alri'ady knows it 

The program in Table 7.1 will calculate the peilonn.uice of one rat 
under the rules laid down above. Table 7.2 contains a part of the product 
of the calculations made for one rat. A total of 200 tiials w('re run, and 
Table 7.2 contains the results for 60 of those trials This simulalc'd rat 
happened to choose the correct side four times in the first 10 trials, seven 
times in the scicond 10, nine timers in the block from 51-60, eight in the 
block 91-100, eight in the block MI-150, and 10 in the final block of 
the run. The performance would not be atypical ol a real rat if Ins 
motivation was not high and the reward was small. 

Four .simulated rats were^run on the computer, and tlu* rc'sults w^ert* 
combined for the four in blocks of 20 successive trials. Tin* l(*arning curve 
that resulted is shown in Figure 7.8. It can hr sc'cn from the figure that 
the results of the simple comput(»r simulation arc* (‘ssentially indistinguish- 
able from results which might have be(*n obtain(*d in an expc'rirnc'nt 
involving four real rats. The speed of the coinpiitei is such that a total of 
15.8 .seconds were required to run this simulation, of whif'h only 2.4 sec- 
onds w^re required for the actual processing of the 800 simulated tri.ds. 

The problem that v^as used to write the program w\as kept simple in 
order to provide an illustration of the nature of computer simulation. 
With a simple program, two virtues can be demonstrat(‘d. The first is the 
increase in clarity and precision of statc'ment of the principles recjuirc'd 
write a program. Vague statements cannot be programmed Tlie second 
virtue is also clear. It is important to know' whether a particular s(‘t of 
simple principles will in fact produce results that are similar to actual 
experimental results. 

However, the very simplicity of a simple problem prevents the pos- 
sibility of demonstrating one of the more interesting of the virtues of 
computer simulation. Often the phenomenon to be simulated is sufficiently 
complex so that without simulation it is difficult or impossible to anticipate 
whether the set of principles selected for the program will actually pro- 
duce the results they are thought adequate to produce. When this is true, 



136 


CONDITIONING AND INSTRUMENTAL LEARNING 


Table 7.1 

Computer program for simple learning 


MAD (03 JAN 1966 VERSION) PROGRAM LISTING 

DIMENSION CHOOSE (2), MEAN (2) 

INTEGER TRIAL, CORECT. LAST 
MEAN(l) - 5 
MEAN (2) - 5 
START 0 

READ DATA SD. LEARN 

PRINT COMMENT $1 RAT LEARNING TO RUN A T MAZE? 

PRINT FORMAT $1H ,S2, H I- PARAMETERS ARE .5,H-h 

STANDARD DEVIATION f, F7 3, H-f, LEARNING 
RATE -h, F7 3 ’SD. LEARN 

READ AND PRINT DATA LAST 
READ AND PRINT DATA CORECT 
THROUGH NEXT, FOR TRIAL 1, 1, TRIAL G LAST 
PRINT FORMAT $H-f- TRIAL NUMBER-hJ3*S. TRIAL 
CHOOSE(l) RANDND (MEAN(1),SD,START) 

CHOOSE(2) RANDND (MEAN(2),SD,S1 ART) 

WHENEVER CHOOSE(l) G CHObSE(2) 

PRINT COMMENT $0THE RAT C HOSE SIDE 1$ 

WHENEVER CORECT E 1 

PRINT COMMENT ^ CORRECT? 

MEAN(l) MEAN(l) i LEARN • (l.-MEAN(l)) 

END OF CONDITIONAL 

OTHERWISE 

PRINT COMMENT $0THE RAT CHOSE SIDE 2$ 

WHENEVER CORECT E 2 

PRINT COMMENT ? CORRECT? 

MEAN(2) MEAN(2) F LEARN* (1 -MEAN(2)) 

END OF CONDITIONAL 

END OF CONDITIONAL 
NEXT CONTINUE 

END OF PROGRAM 


the program will sometimes produce something quite unexpected. In this 
event, ingenuity is often required to determine which part of the program 
was not needed, or what additional principles are required before the 
program produces an acc^ ptable simulation of the behavior. In writing a 
new program, it usually becomes profitable to revise one s thinking about 
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0 1 2 3 4 5 6 7 

BLOCKS OF 20 "TRIALS' 

Figure 7.8 

A learning curve for four rats learning a T maze 
problem— as simulated on a computer (see text for 
explanation). 

the nature of the phenomena that are being simulated. The process is 
rich in potential for producing new formulation and discovery. The 
computer, with its potentiality for simulation, is neither a substitute for 
thinking nor a replacement for experimentation, but it can be an important 
aid in improving both processes. 


Table 7.2 

RAT LEARNING TO RUN A T* MAZE 

PARAMETERS ARE STANDARD DEVIATION - .200 
LEARNING RATE - .015 

LAST - 200* 

CORREGT- 1* 


TRIAL NUMBER 1 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 2 
THE RAT CHOSE SIDE 1 

CORRECT 


TRIAL NUMBER 3 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 4 
THE RAT CHOSE SIDE 1 

CORRECT 
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trial NUMRKR 5 
TIIK RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 6 
THE RAT C HOSE SIDE 2 

TRIAL .NUMBER 7 
THE RAT C HOSE SIDE 2 

TRIAJ- NUMBER 8 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 9 
THE RAT CHOSE SIDE 2 

TRIAL ;^JUMBER 10 
THE RAT C:H0SE SIDE 2 

TRIAL NUMBER 11 
THE RAT C:H()SE SIDE 1 

c:orrec:t 

TRIAL NUMBER 12 
THE RAT CHOSE SIDE 1 

C:ORRECT 

TRIAL NUMBER 13 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 14 
THE RAT CHOSE SIDE I 

CORRECT 

TRIAL NUMBER 15 
THE RAT CHOSE SIDE 1 

C’Orrec:t 

TRIAL NUMBER 16 
THE RAT CHOSE SIDE 1 

CORRECT 

trial NUMBER 17 
THE RAT CHOSE SIDE 1 

C:ORRECT 

TRIAL NUMBER 18 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 19 
THE RAT CHOSE SIDE 1 

CORRECT 


IRUL NUMBER 20 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 51 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 52 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 53 
THE RAT CHOSE SIDE 1 

CORR^^:T 

TRIAL NUMBER 54 
THE RAT CHOSE SIDE 1 

C:ORRECT 

TRIAL NUMBER 55 
THE RAT C:H0SE SIDE 1 

CORRECT 

TRIAL NUMBER 56 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 57 
THE RAT CHOSE SIDE 2 

TRIAL NUMBER 58 
THE RAT CHOSE SIDE 1 

CORRECT’ 

TRIAL NUMBER 59 
THE RAT ( HOSE SIDE , 1 

C:ORRECT 

TRIAL Nl^MBER 60 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 91 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 92 
THE RAT CHOSE SIDE 1 

CORRECT 

TRIAL NUMBER 93 
THE RAT CHOSE SIDE 1 

CORRECT 
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TRIAL NUMRER 94 
THE RAT CHOSE SIDE 
CORRECT 

TRIAL NUMRER 95 
THE RAT CHOSE SIDE 
CORREC'r 

TRIAL NUMRER 9f^ 
THE RAT C HOSE SIDE 
CORKEC r 


TRIAL NUMRER 14S 
1 THE RAT CHOSE SIDE 

CORRECT • 

riUAL NUMRER 149 
I THE RAT ( HOSE SIDE 

('ORRECT 

TRIAL NUMRER 150 
I THE RAT CHOSE SIDE 

C'()IH\ECT 


TRIAL NUMBER 97 , 
THE RAT CHOSE SIDE 

TRIAL NUMRER 9S 
THE RAT C:H0SE SIDE 

TRIAL NUMRER 90 
THE RAT CHOSE SIDE 
C:ORRECT 

TRIAL NUMBER 100 
THE RAT CHOSE SIDE 
CORRECT 

TRIAL NUMRER 141 
THE RAT CHOSE SIDE 
C:ORRECT 


TRIAL NUMRER 142 
THE RAT CHOSE SIDE 


TRIAL NUMRER 143 
THE ftAT CHOSE SIDE 
CORRECT 


TRIAL NUMRER 191 
THE RAT CHOSE SIDE 
( ORRECT 

TRIAL NUMRER 192 
THE RA'l CHOSE SIDE 
CORRECT 

TRIAL NUMRER 193 
THE RA'r C HOSE SIDE 
C'ORRECl' 

TRIAL NUMRER 194 

TWi: RAT c:hose side 

C:C)RRE( 3’ 


TRIAL NUMRER 195 
3'HE RAT C HOSE SIDE 
CORRECT 


I RIAL NUMRER 190 
THE RAT CHOSE. SIDE 
I (A)RRECT 


TRIAL NUMRER 144 
THE RAT CHC:)SE SIDE 
CORRECT 


I'RIAL NUMRER 197 
I3IE RA'r C HOSE SIDE 
C ORREC 1 


TRIAL M'MRER 145 
THE RAT C:HC)SE SIDE 


3 RIAL NUMRER 198 
THE RAT c:hC3SE SIDE 
2 CIORRECT 


TRIAL NUMRER 146 
THE RAT CHOSE SIDE 
C:ORRECT 


TRIAL NUMRER 199 
THE RAT CHOSE SIDE 

c:orrec:t 


TRIAL NUMBER 147 
THE RAT CHOSE SIDE 
CORRECT 


TRIAL NUMBER 200 
THE RAT CHOSE SIDE 
CORRECT 
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blood-sugar level, 78 
brain stimulation, 79 
c'onsumption, 78 
exploration, 79 
learning principle, 25 
manipulation,* 79 
need reduction, 77fF. 
principle of association, 76 


stimulu.sL‘hange, 79 
stimulus increase, 79 
stomach loading, 78 
Reinforcement, explanations of 
need reduction, 79, 80 
stimulus reduction, 80 
increase in probability of response, 
8 () 

<is primitive axiom, 80 
Repeated Jearnmg and extinction, 
87ft. 

select ixc learning, 87ft. 
avoidance conditioning, S8 
Rcpcaled-iexersal learning, 88ff , 92 
Respondent conditioning, 91 
Response gmu rali/ation, 65fl. 

definition, 61 
Kexx ard 

acquired, 4Sfl 
delay of. 54H. 
gradient of, 55 
secondatx. 48, 50 

Scales 

pbxsioal, 61 
psyc hologic al, 64 
Scbediiles of rcunfoi cement, 99fF 
oilier scbediiles, 101 
Secondary reinforcement, stimulus 
roles, 50 

Sensitization, 21 tf. 

Sensors pieconditioning 
definition, 82 
generalization, 83 
rS-CS interval, 83 
number of pairings, 83 
Set tbeorv, 128 
Shaping, 115 
external, lOBff 
internal, 107fF. 

Sign stimuli, 5, 6, 16 
Similarity, definition, 63fF. 

Situational variables, 36 
Skills, 4 
motor, 4 

perceptual-motor, 4 
Social-learning, 110 
Spaced practice, 57 
Spontaneous recovery, 71ff. 

S-R learning, 86 
S-R theory, 26, 27 
S-S learning, 86 



SUBJECT INDEX 


161 


Stickleback, thrcc-spineclj 5fF. 
Stimulus change and learliing, 3 
Stimulus equivalence, 63 
Stimulus-generalization definition. 
61, 62fl. 

Stimulus-sampling model, 128 
theory, 133 

Stimulus situation (SS), 18 
Stochastic model, 128, 132 
Strict behavioi ism, 26, 27 
Subject-controlled, 94 
Sucerssive approviination, 108, 115. 

116 

"Superstition,” 108ft. 

Temporal conditioning, 20 
Teinpoial ihythms, 38 
rheoretical behaviorism, 26 
Thooietical constructs, 126, 133 
Trace 

active, 51 
inactive, 51 
memory, 51, 59 
ixiseveratn e, 51 


[H^rse'veiative memory, 59 
jwiseverative stimulus, 59 
icsjxmse, 51 
stimulus, 51 
structural, 51 
Tiaining amount, 30 
Tiansfei of training, 6 Iff. 
Transposition phenomena, 66 

Vncoiuhtioned response (UR) 
definition, 17 

riuonditioned stimiihis (f^S) 
definition, 17 
intensitv, 3911 

\'.ii lability, concept. 2 
Wniabie- ratio leinfoicrment, 100 
\ ariables. matiu'inntical, 127 
Vt'ibal learning, human, 4 
\'icaiious 

leinfoiccmcnl. 116, 117, 118 
piimshmriil, 1 17, 1 18 

Yeikes-iJ(Hlson Law', 37 



